In my latest Pluralsight video training course – Patterns of Cloud Integration – I addressed application and data integration scenarios that involve cloud endpoints. In the “shared database” module of the course, I discussed integration options where parties relied on a common (cloud) data repository. One of my solutions was inspired by Amazon CTO Werner Vogels who briefly discussed this scenario during his keynote at last Fall’s AWS re:Invent conference. Vogels talked about the tight coupling that initially existed between Amazon.com and IMDB (the Internet Movie Database). Amazon.com pulls data from IMDB to supplement various pages, but they saw that they were forcing IMDB to scale whenever Amazon.com had a burst. Their solution was to decouple Amazon.com and IMDB by injecting a a shared database between them. What was that database? It was HTML snippets produced by IMDB and stored in the hyper-scalable Amazon S3 object storage. In this way, the source system (IMDB) could make scheduled or real-time updates to their HTML snippet library, and Amazon.com (and others) could pummel S3 as much as they wanted without impacting IMDB. You can also read a great Hacker News thread on this “flat file database” pattern as well. In this blog post, I’m going to show you how I created a flat file database in S3 and pulled the data into a Node.js application.
Creating HTML Snippets
This pattern relies on a process that takes data from a source, and converts it into ready to consume HTML. That source – whether a (relational) database or line of business system – may have data organized in a different way that what’s needed by the consumer. In this case, imagine combining data from multiple database tables into a single HTML representation. This particular demo addresses farm animals, so assume that I pulled data (pictures, record details) into one HTML file for each animal.
In my demo, I simply built these HTML files by hand, but in real-life, you’d use a scheduled service or trigger action to produce these HTML files. If the HTML files need to be closely in sync with the data source, then you’d probably look to establish an HTML build engine that ran whenever the source data changed. If you’re dealing with relatively static information, then a scheduled job is fine.
Adding HTML Snippets to Amazon S3
Amazon S3 has a useful portal and robust API. For my demonstration I loaded these snippets into a “bucket” via the AWS portal. In real life, you’d probably publish these objects to S3 via the API as the final stage of an HTML build pipeline.
In this case, I created a bucket called “FarmSnippets” and uploaded four different HTML files.
My goal was to be able to list all the items in a bucket and see meaningful descriptions of each animal (and not the meaningless name of an HTML file). So, I renamed each object to something that described the animal. The S3 API (exposed through the Node.js module) doesn’t give you access to much metadata, so this was one way to share information about what was in each file.
At this point, I had a set of HTML files in an Amazon S3 bucket that other applications could access.
Reading those HTML Snippets from a Node.js Application
Next, I created a Node.js application that consumed the new AWS SDK for Node.js. Note that AWS also ships SDKs for Ruby, Python, .NET, Java and more, so this demo can work for most any development stack. In this case, I used JetBrains WebStorm and the Express framework and Jade template engine to quickly crank out an application that listed everything in my S3 bucket showed individual items.
In the Node.js router (controller) handling the default page of the web site, I loaded up the AWS SDK and issued a simple listObjects command.
//reference the AWS SDK var aws = require('aws-sdk'); exports.index = function(req, res){ //load AWS credentials aws.config.loadFromPath('./credentials.json'); //instantiate S3 manager var svc = new aws.S3; //set bucket query parameter var params = { Bucket: "FarmSnippets" }; //list all the objects in a bucket svc.client.listObjects(params, function(err, data){ if(err){ console.log(err); } else { console.log(data); //yank out the contents var results = data.Contents; //send parameters to the page for rendering res.render('index', { title: 'Product List', objs: results }); } }); };
Next, I built out the Jade template page that renders these results. Here I looped through each object in the collection and used the “Key” value to create a hyperlink and show the HTML file’s name.
block content div.content h1 Seroter Farms - Animal Marketplace h2= title p Browse for animals that you'd like to purchase from our farm. b Cows p table.producttable tr td.header Animal Details each obj in objs tr td.cell a(href='/animal/#{obj.Key}') #{obj.Key}
When the user clicks the hyperlink on this page, it should take them to a “details” page. The route (controller) for this page takes the object ID from the querystring and retrieves the individual HTML snippet from S3. It then reads the content of the HTML file and makes it available for the rendered page.
//reference the AWS SDK var aws = require('aws-sdk'); exports.list = function(req, res){ //get the animal ID from the querystring var animalid = req.params.id; //load up AWS credentials aws.config.loadFromPath('./credentials.json'); //instantiate S3 manager var svc = new aws.S3; //get object parameters var params = { Bucket: "FarmSnippets", Key: animalid }; //get an individual object and return the string of HTML within it svc.client.getObject(params, function(err, data){ if(err){ console.log(err); } else { console.log(data.Body.toString()); var snippet = data.Body.toString(); res.render('animal', { title: 'Animal Details', details: snippet }); } }); };
Finally, I built the Jade template that shows our selected animal. In this case, I used a Jade technique to unescaped HTML so that the tags in the HTML file (held in the “details” variable) were actually interpreted.
block content div.content h1 Seroter Farms - Animal Marketplace h2= title p Good choice! Here are the details for the selected animal. | !{details}
That’s all there was! Let’s test it out.
Testing the Solution
After starting up my Node.js project, I visited the URL.
You can see that it lists each object in the S3 bucket and shows the (friendly) name of the object. Clicking the hyperlink for a given object sends me to the details page which renders the HTML within the S3 object.
Sure enough, it rendered the exact HTML that was included in the snippet. If my source system changes and updates S3 with new or changed HTML snippets, the consuming application(s) will instantly see it. This “database” can easily be consumed by Node.js applications or any application that can talk to the Amazon S3 web API.
Summary
While it definitely makes sense in some cases to provide shared access to the source repository, the pattern shown here is a nice fit for loosely coupled scenarios where we don’t want – or need – consuming systems to bang on our source data systems.
What do you think? Have you used this sort of pattern before? Do you have cases where providing pre-formatted content might be better than asking consumers to query and merge the data themselves?
Want to see more about this pattern and others? Check out my Pluralsight course called Patterns of Cloud Integration.
3 thoughts