6 Things to Know About Microsoft StreamInsight

Posted on June 22, 2010


Microsoft StreamInsight is a new product included with SQL Server 2008 R2.  It is Microsoft’s first foray into the event stream processing and complex event processing market that already has its share of mature products and thought leaders.  I’ve spent a reasonable amount of time with the product over the past 8 months and thought I’d try and give you a quick look at the things you should know about it.

  1. Event processing is about continuous intelligence.  An event can be all sorts of things ranging from a customer’s change of address to a meter read on an electrical meter.  When you have an event driven architecture, you’re dealing with asynchronous communication of data as it happens to consumers who can choose how to act upon it.  The term “complex event processing” refers to gathering knowledge from multiple (simple) business events into smaller sets of summary events.  I can join data from multiple streams and detect event patterns that may have not been visible without the collective intelligence. Unlike traditional database driven applications where you constantly submit queries against a standing set of data, an event processing solution deploys a set of compiled queries that the event data passes through.  This is a paradigm shift for many, and can be tricky to get your head around, but it’s a compelling way to compliment an enterprise business intelligence strategy and improve the availability of information to those who need it.
  2. Queries are written using LINQ.  The StreamInsight team chose LINQ as their mechanism for authoring declarative queries.  As you would hope, you can write a fairly wide set of queries that filter content, join distinct streams, perform calculations and much more.  What if I wanted to have my customer call center send out a quick event whenever a particular product was named in a customer complaint?  My query can filter out all the other products that get mentioned and amplify events about the target product:
    var filterQuery =
          from e in callCenterInputStream
          where e.Product == "Seroterum" select e;

    One huge aspect of StreamInsight queries relates to aggregation.  Individual event calculation and filtering is cool, but what if we want to know what is happening over a period of time?  This is where windows come into play.  If I want to perform a count, average, or summation of events, I need to specify a particular time window that I’m interested in.  For instance, let’s say that I wanted to know the most popular pages on a website over the past fifteen minutes, and wanted to recalculate that total every minute.  So every minute, calculate the count of hits per page over the past fifteen minutes.  This is called a Hopping Window. 

    var activeSessions = from w in websiteInputStream
                                group w by w.PageName into pageGroup
                                from x in pageGroup.HoppingWindow(
                                select new PageSummarySummary
                                    PageName = pageGroup.Key,
                                   TotalRequests = x.Count()

    I’ll have more on this topic in a subsequent blog post but for now, know that there are additional windows available in StreamInsight and I HIGHLY recommend reading this great new paper on the topic from the StreamInsight team.

  3. Queries can be reused and chained.  A very nice aspect of an event processing solution is the ability to link together queries.  Consider a scenario where the first query takes thousands of events per second and filters out the noise and leaves me only with a subset of events that I care about.  I can use the output of that query in another query which performs additional calculations or aggregation against this more targeted event stream.  Or, consider a “pub/sub” scenario where I receive a stream of events from one source but have multiple output targets.  I can take the results from one stream and leverage it in many others.
  4. StreamInsight uses an adapter model for the input and output of data.  When you build up a StreamInsight solution, you end up creating or leveraging adapters.  The product doesn’t come with any production-level adapters yet, but fortunately there are a decent number of best-practice samples available.  In my upcoming book I show you how to build an MSMQ adapter which takes data from a queue and feeds it into the StreamInsight engine.  Adapters can be written in a generic, untyped fashion and therefore support easy reuse, or, they can be written to expect a particular event payload.  As you’d expect, it’s easier to write a specific adapter, but there are obviously long term benefits to building reusable, generic adapters.
  5. There are multiple hosting options.  If you choose, you can create an in-process StreamInsight server which hosts queries and uses adapters to connect to data publishers and consumers.  This is probably the easiest option to build, and you get the most control over the engine.  There is also an option to use a central StreamInsight server which installs as a Windows Service on a machine.  Whereas the first option leverages a “Server.Create()” operation, the latter option uses a “Server.Connect()” manner for working with the Engine.  I’m writing a follow up post shortly on how to leverage the remote server option, so stay tuned.  For now, just know that you have choices for hosting.
  6. Debugging in StreamInsight is good, but overall administration is immature.   The product ships with a fairly interesting debugging tool which also acts as the only graphical UI for doing rudimentary management of a server.  For instance, when you connect to a server (in process or hosted) you can see the “applications” and queries you’ve deployed.
    When a query is running, you can choose to record the activities, and then play back the stream.  This is great for seeing how your query was processed across the various LINQ operations (e.g. joins, counts). 
    Also baked into the Debugger are some nice root cause analysis capabilities and tracing of an event through the query steps.  You also get a fair amount of server-wide diagnostics about the engine and queries.  However, there are no other graphical tools for administering the server.  You’ll find yourself writing code or using PowerShell to perform other administrative tasks.  I expect this to be an area where you see a mix of community tools and product group samples fill the void until future releases produce a more robust administration interface.

That’s StreamInsight in a nutshell.  If you want to learn more, I’ve written a chapter about StreamInsight in my upcoming book, and also maintain a StreamInsight Resources page on the book’s website.

About these ads