BizTalk BAM Data Archiving Explained

I’ll be honest. I can’t say that I’ve ever fully understood all the nuances of the BizTalk BAM infrastructure layer. Sure, I have the basics down, but I often found myself turned around when talking about some of the movement between the BAM databases (specifically, archiving).

Something in Darren’s Professional BizTalk Server 2006 book got me thinking, so I did a quick test to truly see how the BizTalk BAM process archived and partitioned data. The BAMPrimaryImport database has a table named bam_[ActivityName]_Activity_Completed which stores completed records. According to the documentation, once a given amount of time has passed, the records are moved from the bam_[ActivityName]_Activity_Completed table to a newly created partition named bam_[ActivityName]_Activity_[GUID}.

One of the views (named bam_[ActivityName]_Activity_AllInstances) in the BAMPrimaryImport database aggregates the bam_[ActivityName]_Activity_Completed and all the various partitions. This view is used by the BAM Portal. So if you count up the records in the bam_[ActivityName]_Activity_AllInstances view, it should:

  • equal the number of rows in your “Activity Search” from the BAM Portal
  • equal the number of rows in the bam_[ActivityName]_Activity_Completed table and all subsequent partitions

Now, you may ask, what creates these partitions, and how the heck do I get rid of them over time?

There is a table named BAMArchive created during the BAM Configuration. By default, this table is empty. The SSIS/DTS jobs that get created when deploying your BAM infrastructure do pretty much all of the archiving work for you. Until recently, my understanding of the BAM_DM_[ActivityName] SSIS job was that it “cleaned stuff up”. Let’s look closer. When the BAM_DM_[ActivityName] job runs, it creates new partitions, and also purges old ones. So when you run this job, you’ll often see new partitions show up the BAMPrimaryImport database. This job ALSO rebuilds the view, so that the new partition is included in queries to the bam_[ActivityName]_Activity_AllInstances view. Neato.

How does this BAM_DM_[ActivityName] archive stuff? It uses the Metadata_Activities table in the BAMPrimaryImport database to determine how long before data should be archived. As you can see below, the default for an activity is 6 months.

You could set this OnlineWindowTimeLength to 30 minutes, or 10 days or 18 months. Whatever you want. You can either change this directly in the database table, or more appropriately, use the bm.exe set-activitywindow -Activity: -TimeLength: -TimeUnit:Month|Day|Hour|Minute command. In my case, I set this to a short range in order to prove that data is archived. I then executed the BAM_DM_[ActivityName] job to see what happened.

As hoped for, the BAMPrimaryImport now had fewer partitions as the ones containing old data were removed. Where did the data go? If I check out my BAMArchive database, I now see new tables stamped with the time the data was archived.

If I go to the BAM Portal (or check out the bam_[ActivityName]_Activity_AllInstances view directly) my result set is now much smaller. The BAMArchive data does NOT show up in any BAM query, and is only accessible through direct access to the database via custom queries. BAMArchive is purely an archive, not a readily accessible query store.

There you go. A peek into BAM archiving and bit of detail in what that darn BAM_DM_[ActivityName] job does. It’s also important to ask consumers of BAM data what they expect the “active window” to be. Maybe the default of 6 months is fine, but, you better ask that up front or else face the wrath of users who can’t access the BAM data so easily anymore!

Technorati Tags:

Author: Richard Seroter

Richard Seroter is currently the Chief Evangelist at Google Cloud and leads the Developer Relations program. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Chief Evangelist at Google Cloud, Richard leads the team of developer advocates, developer engineers, outbound product managers, and technical writers who ensure that people find, use, and enjoy Google Cloud. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.