TechEd 2009: Day 2 Session Notes (CEP First Look!)

Missed the first session since Los Angeles traffic is comical and I thought “side streets” was a better strategy than sitting still on the freeway.  I was wrong.

Attended a few sessions today, with the highlight for me being the new complex event processing engine that’s part of SQL Server 2008 R2.  Find my notes below from today’s session.

BizTalk Goes Mobile : Collecting Physical World Events from Mobile Devices

I have admittedly spent virtually no time looking at the BizTalk RFID bits, but working for a pharma company, there are plenty of opportunities to introduce supply chain optimization that both increase efficiency and better ensure patient safety.

  • You have the “systems world” where things are described (how many items exist, attributes), but there is the “real world” where physical things actually exist
    • Can’t find products even though you know they are in the store somewhere
    • Retailers having to close their stores to “do inventory” because they don’t know what they actually have
  • Trends
    • 10 percent of patients given wrong medication
    • 13 percent of US orders have wrong item or quantity
  • RFID
    • Provide real time visibility into physical world assets
    • Put unique identifier on every object
      • E.g. tag on device in box that syncs with receipt so can know if object returned in a box actually matches the product ordered (prevent fraud)
    • Real time observation system for physical world
    • Everything that moves can be tracked
  • BizTalk RFID Server
    • Collects edge events
    • Mobile piece runs on mobile devices and feeds the server
    • Manage and monitor devices
    • Out of the box event handlers for SQL, BRE, web services
    • Direct integration with BizTalk to leverage adapters, orchestration, etc
    • Extendible driver model for developers
    • Clients support “store and forward” model
  • Supply Chain Demonstration
    • Connected RFID reader to WinMo phone
      • Doesn’t have to couple code to a given device; device agnostic
    • Scan part and sees all details
    • Instead of starting with paperwork and trying to find parts, started with parts themselves
    • Execute checklist process with questions that I can answer and even take pictures and attach
  • RFID Mobile
    • Lightweight application platform for mobile devices
    • Enables rapid hardware agnostic RFID and Barcode mobile application development
    • Enables generation of software events from mobile devices (events do NOT have to be RFID events)
  • Questions:
    • How receive events and process?
      • Create “DeviceConnection” object and pass in module name indicating what the source type is
      • Register your handler on the NotificationEvent
      • Open the connection
      • Process the event in the handler
    • How send them through BizTalk?
      • Intermittent connectivity scenario supported
      • Create RfidServerConnector object
      • Initialize it
      • Call post operation with the array of events
    • How get those events from new source?
      • Inherit DeviceProvider interface and extend the PhysicalDeviceProxy class

Low Latency Data and Event Processing with Microsoft SQL Server

I eagerly anticipated this session to see how much forethought Microsoft put into their first CEP offering.  This was a fairly sparsely attended session, which surprised me a bit.  That, and the folks who ended up leaving early, apparently means that most people here are unaware of this problem/solution space, and don’t immediately grasp the value.  Key Takeaway: This stuff has a fairly rich set of capabilities so far and looks well thought out from a “guts” perspective.  There’s definitely a lot of work left to do, and some things will probably have to change, but I was pretty impressed.  We’ll see if Charles agrees, based on my hodge podge of notes 😉

  • Call CEP the continuous and incremental processing of event streams from multiple sources based on declarative query and pattern specifications with near-zero latency.
  • Unlike DB app with ad hoc queries that have range of latency from seconds/hours/days and hundreds of events per second, with event driven apps, have continuous standing queries with latency measured in milliseconds (or less) and up to tens of thousands of events per second (or more).
  • As latency requirements become stricter, or data rates reach a certain point, then most cost effective solution is not standard database application
    • This is their sweet spot for CEP scenarios
  • Example CEP scenarios …
    • Manufacturing (sensor on plant floor, react through device controllers, aggregate data, 10,000 events per second); act on patterns detected by sensors such as product quality
    • Web analytics, instrument server to capture click-stream data and determine online customer behavior
    • Financial services listening to data feeds like news or stocks and use that data to run queries looking for interesting patterns that find opps to buy or sell stock; need super low latency to respond and 100,000 events per second
    • Power orgs catch energy consumption and watch for outages and try to apply smart grids for energy allocation
    • How do these scenarios work?
      • Instrument the assets for data acquisitions and load the data into an operational data store
      • Also feed the event processing engine where threshold queries, event correlation and pattern queries are run over the data stream
      • Enrich data from data streams for more static repositories
    • With all that in place, can do visualization of trends with KPI monitoring, do automated anomaly detection, real-time customer segmentation, algorithmic training and proactive condition-based maintenance (e.g. can tell BEFORE a piece of equipment actually fails)
  • Cycle: monitor, manage, mine
    • General industry trends (data acquisition costs are negligible, storage cost is cheap, processing cost is non-negligible, data loading costs can be significant)
    • CEP advantages (process data incrementally while in flight, avoid loading while still doing processing you want, seamless querying for monitoring, managing and mining
  • The Microsoft Solution
    • Has a circular process where data is captured, evaluated against rules, and allows for process improvement in those rules
  • Deployment alternatives
    • Deploy at multiple places on different scale
    • Can deploy close to data sources (edges)
    • In mid tier where consolidate data sources
    • At data center where historical archive, mining and large scale correlation happens
  • CEP Platform from Microsoft
    • Series of input adapters which accept events from devices, web servers, event stores and databases; standing queries existing in the CEP engine and also can access any static reference data here; have output adapters for event targets such as pagers and monitoring devices, KPI dashboards, SharePoint UIs, event stores and databases
    • VS 2008 are where event driven apps are written
    • So from source, through CEP engine, into event targets
    • Can use SDK to write additional adapters for input or output adapters
      • Capture in domain format of source and transform to canonical format that the engine understands
    • All queries receive data stream as input, and generate data stream as output
    • Queries can be written in LINQ
  • Events
    • Events have different temporal characteristics; may be point in time events, interval events with fixed duration or interval events with initially known duration
    • Rich payloads cpature all properties of an event
  • Event types
    • Use the .NET type system
    • Events are structured and can have multiple fields
    • Each field is strongly typed using .NET framework type
    • CEP engine adds metadata to capture temporal characteristics
    • Event SOURCES populate time stamp fields
  • Event streams
    • Stream is a possibly infinite series of events
      • Inserting new events
      • Changes to event durations
    • Stream characteristics
      • Event/data arrival patterns
        • Steady rate with end of stream indication (e.g. files, tables)
        • Intermittent, random or burst (e.g. retail scanners, web)
      • Out of order events
        • CEP engine does the heavy lifting when dealing with out-of-order events
  • Event stream adapters
    • Design time spec of adapter
      • For event type and source/sink
      • Methods to handle event and stream behavior
      • Properties to indicate adapter features to engine
        • Types of events, stream properties, payload spec
  • Core CEP query engine
    • Hosts “standing queries”
      • Queries are composable
      • Query results are computed incrementally
    • Query instance management (submit, start, stop, runtime stats)
  • Typical CEP queries
    • Complex type describes event properties
    • Grouping, calculation, aggregation
    • Multiple sources monitored by same query
    • Check for absence of data
  • CEP query features …
    • Calculations
    • Correlation of streams (JOIN)
    • Check for absence (EXISTS)
    • Selection of events from stream (FILTER)
    • Aggregation (SUM, COUNT)
    • Ranking (TOP-K)
    • Hopping or sliding windows
    • Can add NEW domain-specific operators
    • Can do replay of historical data
  • LINQ examples shown (JOIN, FILTER)

from e1 in MyStream1

join e2 in MyStream2

e1.ID equals e2.ID

where e1.f2 = “foo”

select new { e1.f1, e2.f4)

  • Extensibility
    • Domain specific operators, functions, aggregates
    • Code written in .NET and deployed as assembly
    • Query operations and LINQ queries can refer to user defined things
  • Dev Experience
    • VS.NET as IDE
    • Apps written in C#
    • Queries in LINQ
  • Demos
    • Listening on power consumption events from laptop with lots of samples per second
    • Think he said that this client app was hosting the CEP engine in process (vs. using a server instance)
    • Uses Microsoft.ComplexEventProcessing namespace (assembly?)
    • Shows taking initial stream of just getting all events, and instead refining (through Intellisense!) query to set a HoppingWindow attribute of 1 second. He then aggregates on top of that to get average of the stream every second.
      • This all done (end to end) with 5 total statements of code
    • Now took that code, and replaced other aggregation with new one that does grouping by ID and then can aggregate by each group separately
    • Showed tool with visualized query and you can step through the execution of that query as it previous ran; can set a breakpoint with a condition (event payload value) and run tool until that scenario reached
      • Can filter each operator and only see results that match that query filter
      • Can right click and do “root cause analysis” to see only events that potentially contributed to the anomaly result
  • Same query can be bound to different data sources as long as they deliver the required event type
    • If new version of upstream device became available, could deploy new adapter version and bind it to new equipment
  • Query calls out what data type it requires
  • No changes to query necessary for reuse if all data sources of same type
  • Query binding is a configuration step (no VS.NET)
  • Recap: Event driven apps are fundamentally different from traditional database apps because queries are continuous, consume and produce streams and compute results incrementally
  • Deployment scenarios
    • Custom CEP app dev that uses instance of engine to put app on top of it
    • Embed CEP in app for ISVs to deliver to customers
    • CEP engine is part of appliance embedded in device
    • Put CEP engine into pipeline that populates data warehouse
  • Demo from OSIsoft
    • Power consumption data goes through CEP query to scrub data and reduce rate before feeding their PI System where then another CEP query run to do complex aggregation/correlation before data is visualized in a UI
      • Have their own input adapters that take data from servers, run through queries, and use own output adapters to feed PI System

I have lots of questions after this session.  I’m not fully grasping the role of the database (if at all).  Didn’t show much specifically around the full lifecycle (rules, results, knowledge, rule improvement), so I’d like to see what my tooling is for this.  Doesn’t look like much business tooling is part of the current solution plan which might hinder doing any business driven process improvement.  Liked the LINQ way of querying, and I could see someone writing a business friendly DSL on top.

All in all, this will be fun to play with once it’s available.  When is that?  SQL team tells us that we’ll have a TAP in July 2009 with product availability targeted for 1H 2010.

Author: Richard Seroter

Richard Seroter is currently the Chief Evangelist at Google Cloud and leads the Developer Relations program. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Chief Evangelist at Google Cloud, Richard leads the team of developer advocates, developer engineers, outbound product managers, and technical writers who ensure that people find, use, and enjoy Google Cloud. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

9 thoughts

    1. Hi Rojit. CEP would fit into an overall data/application strategy. It wouldn’t replace existing ESB or ETL tools. It’s typically for real-time intelligence, not reliable (or bulk) data movement.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.