Sunday, December 16, 2007

On Event Stream Processing






This is in part a response to my friend and colleague Claudi for his recent post in the CEP Interest Group

There are many types of streams in the universe - the Gulf stream that affects the weather, a water stream who provide pastoral nature sight, and an audio stream, to name just a few.
In the event processing area the name "stream" appears first in the database research community, as a research project in Stanford. Interestingly the name "event" is never mentioned, and the term "data stream" is the central concept. The first one who made a blend of the "stream" concept and "event processing" concept is my friend Mark Palmer from Progress who did not like the "complex" word and thought that the term "event stream processing" will be more accepted, Mark certainly did not mean to talk about data streams in the academic sense. In the discussion session of the term event stream processing in Wikipedia
Mark writes:
ESP SHOULD GO AWAY AND I HELPED CREATE THE PROBLEM!!!
You are completely correct in my opinion; these should be merged. And I say this from the perpsective of the software vendor that popularized and caused the confusion in the first place. I'm the general manager of the Progress Apama software division and we coined the term "event stream processing" in April of 2005 when we acquired Apama for $30M - we didn't like the term "complex event processing" and decided to make up another term. Yes, stream processing, and data stream processing have been used as terms in academia, but we made up the term ESP as a synonym for CEP. Some on this list will argue that there are subtle, technical differences, but, being in the center of this quagmire of a debate, I think they should be merged, and that ESP should basically go away!
- Mark Palmer, General Manager, Progress Apama, mpalmer@PROGRESS.COM

Another indication of the blurring between ESP and CEP is that the vendor descendants of the academic projects - Streambase and Coral8 now positioned themselves as "complex event processing" vendors. Both have "complex event processing" all over their homepages, Streambase labels its product as - "complex event processing platforms" (well -- we'll discuss platforms in another posting); Coral8 has a portal which is offers self-service CEP. Aleri which also provides SQL oriented API, also uses the term CEP, although they are also using the term "Aleri streaming platform" as the way to do CEP. Thus, while the term "stream processing" is very much alive in the academic database community - see the VLDB 2007 program, for example, it seems that the market has already voted on the unification of these two terms, behind the CEP term.
Why did it happen ? - in the beginning we have seen some 2 x 2 matrices, showing that CEP is - complex and l0w-performance, while ESP is simple and high-performance. It does not seem that any vendor thought it is positioned in one of the extremes, since most applications are somewhat in the middle, and confusing the customers with two names from vendors who have competed on roughly the same applications and customers did not help any of the vendors, thus, the market wisely moved to one name (BTW - this name could have also been "event stream processing" as Progress suggested, but for some reason the term CEP has caught, and some potential customers are still nervous about the word "complex", but it got the traction nevertheless).
This has been until now discussion on branding, and did not answer the questions - whether there are real differences between ESP and CEP ? in some cases, people indicated theoretical differences, the most notable is: stream processing is ordered, while CEP is partially-ordered.
It may be true, though, I was never convinced that "total order" is an inherent property of stream, it is just the way it was happened to be defined in the academic projects, but I think that the more important difference is - whether we start from set-oriented thinking (stream processing) or from individual-event-oriented thinking (event processing), and there are pros and cons of thinking in each of them, but the bottom line is that real applications may be mixed, they may have ordered events from the same type (e.g. when we are looking at trends in time-series), or it can have unordered events of the same type (e.g. when we are looking at information from various sensors whose original timestamps may not be synchronized), in fact, it can have both in the same application. It is true that the space of CEP applications is not monolithic, but there are other classifications that are more useful then the classification of partial vs. ordered set, thus, for practical purposes, let's assume that "stream processing" as defined by those who are looking for the theoretical differences indeed covers a subset of the space of functionality, however - this subset is not important enough to have separate products covering it, or even to mention it as a sub-class.
Last but not least -- an answer to Claudi on his claim that there is not really a CEP engine, since none of the current products know how to obtain general relations among events and calculate transitive closures.
My answer is that event relationship definitions do exist, but this is not the main point, the point is that one may claim that "there is not really a CEP engine that contains all the possible language features that one can think of", and this is true, the EP discipline is young, and I am sure that we just scratched the surface, and EP products will include many features that we event did not think of them today (otherwise it is an indication that the area has failed!), however, without talking about a certain feature, CEP engines do exist today, none is perfect, but probably sufficient for big majority of the existing applications today, so theoretical perfection may not be the criterion to call something "CEP engine", we'll have to settle in "sufficient conditions"
I'll relate to relations among events, including transitive closure in another postings - but the way they exist or don't exist does not really matter for the question. Long posting today - so this is all for now.








No comments: