I’ve been spending a fair amount of free time recently looking much deeper into Microsoft StreamInsight which is the complex event processing engine including in SQL Server 2008 R2. I figured that I’d share a few thoughts on it.
First off, as expected with such a new product, there is a dearth of available information. There’s some information out there, but as you’d expect, there are plenty of topics where you’d love to see significantly more depth. You’ve got the standard spots to read up on it:
- Microsoft product page for StreamInsight
- MSDN documentation
- Microsoft Forums
- PDC 2009 sessions
- Latest CTP available for download
The provided documentation isn’t bad, and the samples are useful for trying to figure things out, but man, you still really have to commit a good amount of time to grasping how it all works.
The low-latency/high-volume aspect is touted heavily in these types of platforms, but I actually see a lot of benefit in just having the standing queries. As one writer on StreamInsight put it, unlike database-driven applications where you throw queries at data, in CEP solutions, you throw data at queries. Even if you don’t have 100,000 transactions per second to process, you could benefit by passing moderate volumes of data through strategic queries in order to find useful correlations or activities that you wish to immediately act upon.
Using LINQ for queries is nice, but for me, I had to keep remembering that I was dealing with a stream of data and not a static data set. You must establish a “window” if you want to execute aggregations or joins against a particular snapshot of data. It makes total sense given that you’re dealing with streams of data, but for some reason, it took me a few cycles to retain that. Despite the fact that you’re using LINQ on the streams, you have to think of StreamInsight more like BizTalk (transient data flying through a bus) instead of a standard application where LINQ would be used to query at-rest data.
The samples provided in StreamInsight are ok, and the PDC examples provide a good set of complimentary bits. However, I was disappointed that there were no “push” adapter scenarios demonstrated. That is, virtually every demonstration I’ve seen shows how a document is sucked into StreamInsight and the events are processed. Some examples show a poller, but I haven’t seen any cases of a device/website/application pushing data directly into the StreamInsight engine. So, I built a MSMQ adapter to try it out. In the scenario I built, I generate web-click and event log data and populate a set of MSMQ queues. My StreamInsight MSMQ adapter then responds to data hitting the queue and runs it through the engine. Works pretty well.
It’s not too tough to build an adapter, BUT, I bet it’s hard to build a good one. I am positive that mine is fine for demos but would elicit laughter from the StreamInsight team. Either way, I hope that the final release of StreamInsight contains more demonstrations of the types of scenarios that they heavily tout as key use cases.
Lastly, I’ll look forward to seeing what tooling pops up around StreamInsight. While it consists of an “engine”, the whole things feels much more like a toolkit than a product. You have to write a lot of plumbing code on adapters and I’d love to see more visual tooling on administering servers and adding new queries to running servers.
Lots of rambling thoughts, but I find complex event processing to be a fascinating area and something that very well may be a significant topic in IT departments this year and next. There are some great, mature tools already in the CEP marketplace, but you have to assume that when Microsoft gets involved, the hype around a technology goes up a notch. If you’re a BizTalk person, the concepts behind StreamInsight aren’t too difficult to grasp, and you would do well to add this to your technology repertoire.