Creating a “Flat File” Shared Database with Amazon S3 and Node.js

In my latest Pluralsight video training course – Patterns of Cloud Integration – I addressed application and data integration scenarios that involve cloud endpoints. In the “shared database” module of the course, I discussed integration options where parties relied on a common (cloud) data repository. One of my solutions was inspired by Amazon CTO Werner Vogels who briefly discussed this scenario during his keynote at last Fall’s AWS re:Invent conference. Vogels talked about the tight coupling that initially existed between and IMDB (the Internet Movie Database). pulls data from IMDB to supplement various pages, but they saw that they were forcing IMDB to scale whenever had a burst. Their solution was to decouple and IMDB by injecting a a shared database between them. What was that database? It was HTML snippets produced by IMDB and stored in the hyper-scalable Amazon S3 object storage. In this way, the source system (IMDB) could make scheduled or real-time updates to their HTML snippet library, and (and others) could pummel S3 as much as they wanted without impacting IMDB. You can also read a great Hacker News thread on this “flat file database” pattern as well. In this blog post, I’m going to show you how I created a flat file database in S3 and pulled the data into a Node.js application.

Creating HTML Snippets

This pattern relies on a process that takes data from a source, and converts it into ready to consume HTML. That source – whether a (relational) database or line of business system – may have data organized in a different way that what’s needed by the consumer. In this case, imagine combining data from multiple database tables into a single HTML representation. This particular demo addresses farm animals, so assume that I pulled data (pictures, record details) into one HTML file for each animal.


In my demo, I simply built these HTML files by hand, but in real-life, you’d use a scheduled service or trigger action to produce these HTML files. If the HTML files need to be closely in sync with the data source, then you’d probably look to establish an HTML build engine that ran whenever the source data changed. If you’re dealing with relatively static information, then a scheduled job is fine.

Adding HTML Snippets to Amazon S3

Amazon S3 has a useful portal and robust API. For my demonstration I loaded these snippets into a “bucket” via the AWS portal. In real life, you’d probably publish these objects to S3 via the API as the final stage of an HTML build pipeline.

In this case, I created a bucket called “FarmSnippets” and uploaded four different HTML files.


My goal was to be able to list all the items in a bucket and see meaningful descriptions of each animal (and not the meaningless name of an HTML file). So, I renamed each object to something that described the animal. The S3 API (exposed through the Node.js module) doesn’t give you access to much metadata, so this was one way to share information about what was in each file.


At this point, I had a set of HTML files in an Amazon S3 bucket that other applications could access.

Reading those HTML Snippets from a Node.js Application

Next, I created a Node.js application that consumed the new AWS SDK for Node.js. Note that AWS also ships SDKs for Ruby, Python, .NET, Java and more, so this demo can work for most any development stack. In this case, I used JetBrains WebStorm and the Express framework  and Jade template engine to quickly crank out an application that listed everything in my S3 bucket showed individual items.

In the Node.js router (controller) handling the default page of the web site, I loaded up the AWS SDK and issued a simple listObjects command.

//reference the AWS SDK
var aws = require('aws-sdk');

exports.index = function(req, res){

    //load AWS credentials
    //instantiate S3 manager
    var svc = new aws.S3;

    //set bucket query parameter
    var params = {
      Bucket: "FarmSnippets"

    //list all the objects in a bucket
    svc.client.listObjects(params, function(err, data){
        } else {
            //yank out the contents
            var results = data.Contents;
            //send parameters to the page for rendering
            res.render('index', { title: 'Product List', objs: results });

Next, I built out the Jade template page that renders these results. Here I looped through each object in the collection and used the “Key” value to create a hyperlink and show the HTML file’s name.

block content
      h1 Seroter Farms - Animal Marketplace
      h2= title
      p Browse for animals that you'd like to purchase from our farm.
      b Cows
                td.header Animal Details
            each obj in objs
                        a(href='/animal/#{obj.Key}') #{obj.Key}

When the user clicks the hyperlink on this page, it should take them to a “details” page. The route (controller) for this page takes the object ID from the querystring and retrieves the individual HTML snippet from S3. It then reads the content of the HTML file and makes it available for the rendered page.

//reference the AWS SDK
var aws = require('aws-sdk');

exports.list = function(req, res){

    //get the animal ID from the querystring
    var animalid =;

    //load up AWS credentials
    //instantiate S3 manager
    var svc = new aws.S3;

    //get object parameters
    var params = {
        Bucket: "FarmSnippets",
        Key: animalid

    //get an individual object and return the string of HTML within it
    svc.client.getObject(params, function(err, data){
        } else {
            var snippet = data.Body.toString();
            res.render('animal', { title: 'Animal Details', details: snippet });

Finally, I built the Jade template that shows our selected animal. In this case, I used a Jade technique to unescaped HTML so that the tags in the HTML file (held in the “details” variable) were actually interpreted.

block content
        h1 Seroter Farms - Animal Marketplace
        h2= title
        p Good choice! Here are the details for the selected animal.
        | !{details}

That’s all there was! Let’s test it out.

Testing the Solution

After starting up my Node.js project, I visited the URL.


You can see that it lists each object in the S3 bucket and shows the (friendly) name of the object. Clicking the hyperlink for a given object sends me to the details page which renders the HTML within the S3 object.


Sure enough, it rendered the exact HTML that was included in the snippet. If my source system changes and updates S3 with new or changed HTML snippets, the consuming application(s) will instantly see it. This “database” can easily be consumed by Node.js applications or any application that can talk to the Amazon S3 web API.


While it definitely makes sense in some cases to provide shared access to the source repository, the pattern shown here is a nice fit for loosely coupled scenarios where we don’t want – or need – consuming systems to bang on our source data systems.

What do you think? Have you used this sort of pattern before? Do you have cases where providing pre-formatted content might be better than asking consumers to query and merge the data themselves?

Want to see more about this pattern and others? Check out my Pluralsight course called Patterns of Cloud Integration.

About these ads

Categories: AWS, Cloud, General Architecture, Node.js, SOA

3 replies


  1. Scott Banwart's Blog › Distributed Weekly 206
  2. Reading Notes 2013-05-13 | Matricis
  3. Favorite Books and Blog Posts of 2013 | Richard Seroter's Architecture Musings

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 290 other followers

%d bloggers like this: