Julian Hyde, chief architect of SQLstream, and lead developer of Mondrian, has a great article in Communications of the ACM on Data in Flight. The article provides an overview of streaming query engines, demonstrates simple queries using a clickstream example, compares streaming query engines to relational database technology, discusses the advantages of streaming, and concludes with additional streaming applications, including CEP.
On streaming query engines vs. relational database technology:
“The streaming query engine is a new technology that excels in processing rapidly flowing data and producing results with low latency. It arose out of the database research community and therefore shares some of the characteristics that make relational databases popular, but it is most definitely not a database. In a database, the data arrives first and is stored on disk; then users apply queries to the stored data. In a streaming query engine, the queries arrive before the data. The data flows through a number of continuously executing queries, and the transformed data flows out to applications. One might say that a relational database processes data at rest, whereas a streaming query engine processes data in flight.”
On CEP and Streaming, Hyde states:
“Application areas include complex event processing (CEP), monitoring, population data warehouses, and middleware. A CEP query looks for sequences of events on a single stream or on multiple streams that, together, match a pattern and create a "complex event" of interest to the business. Applications of CEP include fraud detection and electronic trading.
CEP has been used within the industry as a blanket term to describe the entire field of streaming query systems. This is regrettable because it has resulted in a religious war between SQL-based and non-SQL-based vendors and, in overly focusing on financial services applications, has caused other application areas to be neglected.”
Hyde concludes his article as follows:
“Streaming query engines are based on the same technology as relational databases but are designed to process data in flight. Streaming query engines can solve some common problems much more efficiently than databases because they match the time-based nature of the problems, they retain only the working set of data needed to solve the problem, and they process data asynchronously and continuously.
Because of their shared SQL language, streaming query engines and relational databases can collaborate to solve problems in monitoring and realtime business intelligence. SQL makes them accessible to a large pool of people with SQL expertise.
Just as databases can be applied to a wide range of problems, from transaction processing to data warehousing, streaming query systems can support patterns such as enterprise messaging, complex event processing, continuous data integration, and new application areas that are still being discovered.”
If you are even remotely interested in event processing, active information strategies and/or stream processing, I highly recommend reading Hyde’s article.