Creating a “Flat File” Shared Database with Amazon S3 and Node.js

In my latest Pluralsight video training course – Patterns of Cloud Integration – I addressed application and data integration scenarios that involve cloud endpoints. In the “shared database” module of the course, I discussed integration options where parties relied on a common (cloud) data repository. One of my solutions was inspired by Amazon CTO Werner Vogels who briefly discussed this scenario during his keynote at last Fall’s AWS re:Invent conference. Vogels talked about the tight coupling that initially existed between Amazon.com and IMDB (the Internet Movie Database). Amazon.com pulls data from IMDB to supplement various pages, but they saw that they were forcing IMDB to scale whenever Amazon.com had a burst. Their solution was to decouple Amazon.com and IMDB by injecting a a shared database between them. What was that database? It was HTML snippets produced by IMDB and stored in the hyper-scalable Amazon S3 object storage. In this way, the source system (IMDB) could make scheduled or real-time updates to their HTML snippet library, and Amazon.com (and others) could pummel S3 as much as they wanted without impacting IMDB. You can also read a great Hacker News thread on this “flat file database” pattern as well. In this blog post, I’m going to show you how I created a flat file database in S3 and pulled the data into a Node.js application.

Creating HTML Snippets

This pattern relies on a process that takes data from a source, and converts it into ready to consume HTML. That source – whether a (relational) database or line of business system – may have data organized in a different way that what’s needed by the consumer. In this case, imagine combining data from multiple database tables into a single HTML representation. This particular demo addresses farm animals, so assume that I pulled data (pictures, record details) into one HTML file for each animal.

2013.05.06-s301

In my demo, I simply built these HTML files by hand, but in real-life, you’d use a scheduled service or trigger action to produce these HTML files. If the HTML files need to be closely in sync with the data source, then you’d probably look to establish an HTML build engine that ran whenever the source data changed. If you’re dealing with relatively static information, then a scheduled job is fine.

Adding HTML Snippets to Amazon S3

Amazon S3 has a useful portal and robust API. For my demonstration I loaded these snippets into a “bucket” via the AWS portal. In real life, you’d probably publish these objects to S3 via the API as the final stage of an HTML build pipeline.

In this case, I created a bucket called “FarmSnippets” and uploaded four different HTML files.

2013.05.06-s302

My goal was to be able to list all the items in a bucket and see meaningful descriptions of each animal (and not the meaningless name of an HTML file). So, I renamed each object to something that described the animal. The S3 API (exposed through the Node.js module) doesn’t give you access to much metadata, so this was one way to share information about what was in each file.

2013.05.06-s303

At this point, I had a set of HTML files in an Amazon S3 bucket that other applications could access.

Reading those HTML Snippets from a Node.js Application

Next, I created a Node.js application that consumed the new AWS SDK for Node.js. Note that AWS also ships SDKs for Ruby, Python, .NET, Java and more, so this demo can work for most any development stack. In this case, I used JetBrains WebStorm and the Express framework  and Jade template engine to quickly crank out an application that listed everything in my S3 bucket showed individual items.

In the Node.js router (controller) handling the default page of the web site, I loaded up the AWS SDK and issued a simple listObjects command.

//reference the AWS SDK
var aws = require('aws-sdk');

exports.index = function(req, res){

    //load AWS credentials
    aws.config.loadFromPath('./credentials.json');
    //instantiate S3 manager
    var svc = new aws.S3;

    //set bucket query parameter
    var params = {
      Bucket: "FarmSnippets"
    };

    //list all the objects in a bucket
    svc.client.listObjects(params, function(err, data){
        if(err){
            console.log(err);
        } else {
            console.log(data);
            //yank out the contents
            var results = data.Contents;
            //send parameters to the page for rendering
            res.render('index', { title: 'Product List', objs: results });
        }
    });
};

Next, I built out the Jade template page that renders these results. Here I looped through each object in the collection and used the “Key” value to create a hyperlink and show the HTML file’s name.

block content
    div.content
      h1 Seroter Farms - Animal Marketplace
      h2= title
      p Browse for animals that you'd like to purchase from our farm.
      b Cows
      p
          table.producttable
            tr
                td.header Animal Details
            each obj in objs
                tr
                    td.cell
                        a(href='/animal/#{obj.Key}') #{obj.Key}

When the user clicks the hyperlink on this page, it should take them to a “details” page. The route (controller) for this page takes the object ID from the querystring and retrieves the individual HTML snippet from S3. It then reads the content of the HTML file and makes it available for the rendered page.

//reference the AWS SDK
var aws = require('aws-sdk');

exports.list = function(req, res){

    //get the animal ID from the querystring
    var animalid = req.params.id;

    //load up AWS credentials
    aws.config.loadFromPath('./credentials.json');
    //instantiate S3 manager
    var svc = new aws.S3;

    //get object parameters
    var params = {
        Bucket: "FarmSnippets",
        Key: animalid
    };

    //get an individual object and return the string of HTML within it
    svc.client.getObject(params, function(err, data){
        if(err){
            console.log(err);
        } else {
            console.log(data.Body.toString());
            var snippet = data.Body.toString();
            res.render('animal', { title: 'Animal Details', details: snippet });
        }
    });
};

Finally, I built the Jade template that shows our selected animal. In this case, I used a Jade technique to unescaped HTML so that the tags in the HTML file (held in the “details” variable) were actually interpreted.

block content
    div.content
        h1 Seroter Farms - Animal Marketplace
        h2= title
        p Good choice! Here are the details for the selected animal.
        | !{details}

That’s all there was! Let’s test it out.

Testing the Solution

After starting up my Node.js project, I visited the URL.

2013.05.06-s304

You can see that it lists each object in the S3 bucket and shows the (friendly) name of the object. Clicking the hyperlink for a given object sends me to the details page which renders the HTML within the S3 object.

2013.05.06-s305

Sure enough, it rendered the exact HTML that was included in the snippet. If my source system changes and updates S3 with new or changed HTML snippets, the consuming application(s) will instantly see it. This “database” can easily be consumed by Node.js applications or any application that can talk to the Amazon S3 web API.

Summary

While it definitely makes sense in some cases to provide shared access to the source repository, the pattern shown here is a nice fit for loosely coupled scenarios where we don’t want – or need – consuming systems to bang on our source data systems.

What do you think? Have you used this sort of pattern before? Do you have cases where providing pre-formatted content might be better than asking consumers to query and merge the data themselves?

Want to see more about this pattern and others? Check out my Pluralsight course called Patterns of Cloud Integration.

Author: Richard Seroter

Richard Seroter is currently the Chief Evangelist at Google Cloud and leads the Developer Relations program. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Chief Evangelist at Google Cloud, Richard leads the team of developer advocates, developer engineers, outbound product managers, and technical writers who ensure that people find, use, and enjoy Google Cloud. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.