Friday, October 5, 2007

Events from unstructured sources

This is an holiday in Israel ("succot"), and I have travelled with my family to the national park in Ramat-Gan (about 100 KM away from home, not that far, well - Israel is a small country, you can't go far anywhere..) to see the exhibition of "speaking chairs", many artists have painted and sculptured chairs, where all the basic chairs look like this, and the chairs are spread across the park (here is one sample in the picture)...

Today's topic is about getting events from unstructured sources. Event has been defined as something that happens, and one of the questions - how do we know it happen. Some of the events are pushed (e.g. stock quotes), but many of the events in the universe, should be obtained by analysis of various sources of input. Video streams is one source - e.g. understanding from a video stream that somebody climbs a security fence, a stolen car just entered the highway, and a person whose driving licence has been revoked is driving. Some other interesting events can be obtained from news streams, even from Emails, and other type of unstructured texts.

I believe that the event processing of the future will deal not only with the events that are easy to get, since somebody pushes them, but in many cases, with events that are not easy to get, and not easy to realize that they happened, since it will involve analyzing different media streams, we can see some early implementation of this area, and I believe it has a big potential, good topics for theses, also.... have fun.

Wednesday, October 3, 2007

Route to markets for event processing technology

I am back today to more macro oriented blog (will try to flip-flop between macro and micro oriented issues, since it seems that each has its audience of interest). One of the interesting questions is -- in what form event processing technologies will materialize in the market ?

Looking at Bill Gassman's presentation in the Gartner EPS on BAM - he talks about the BAM market and partition it to:
  • 65% - embedded inside vertical solutions
  • 20% - embedded inside BPM (+ ESB) middleware/products
  • 10% - embedded inside BI products
  • 3% - embedded inside IT operations (e.g. BSM)
  • 2% - general purpose BAM products

Gassman's interpretation of BAM is quite wide (I have somewhat narrower interpretation), and covers most of the EP types of applications, so let's take it as a starting point. Without commitment to the exact numbers, the order is consistent with my observation on this market. While the early adopters used stand-along engine, and built applications on top of it, this will become a relatively small route to market, and the segment that will grow most is that of EP technology embedded inside vertical solutions, we see signs of this is multiple industries now. The second largest segment is EP technology embedded inside middleware, we see that the big players in this area are taking this approach, the rational behind it is twofold - from the middleware point of view - EP capabilities is now becoming a must, due to competitive pressures, and from the ROI to customers POV - EP applications are typically not isolated, and the biggest investment is to connect them with the consumers/producers of events, thus application integration middlewares with adapters to multiple systems may assist. There will be always a market for a stand-alone event processing technologies, and this market can be segmented to "general purpose" engines, and optimized for special purpose ones..... I am not sure that what I am writing now reflects the current reality, but it certainly reflect the trend....

More on the role of general and specific frameworks - later.

Monday, October 1, 2007

What is an edge in event processing network ? - a terminology wondering

Today I'll leave the macro questions about SOA and positioning of event processing discipline, and deal with a micro issue, an issue of terminology, that came up when I have looked at the current draft of the event processing glossary is the following issue: one of the main abstractions of event processing is event processing network. The network is in essence a directed (maybe cyclic) graph, as any graph it consists of nodes and edges, the definition in the glossary is: Event Processing Network (EPN): A set of event processing agents (EPAs) and a set of event channels connecting them. Clearly, EPA is a node - one can also say that producers and consumer are type of nodes (to show the entire picture), the question is what is an edge in this network -- from the definition one may assume that channel is an edge, however unlike an edge in graph, that connect two nodes, a channel may have multiple sources and multiple sinks, thus it is more comfortable to look at the channel as another type of node (and indeed channel has also some functions associated with it), thus channel is not the edge, we can say is that edge is the "pipe" through which a events flow from one node (producer, channel, agent) to another node (channel, agent, consumer). Now we need to determine what it is ? my suggestion is to call such a collection an "event stream", however, the glossary defines it as:
Event stream: a linearly ordered sequence of events. Here we have two issues - one to distinguish between the "pipe" and the collection of events flowing on this "pipe", which is the same type of ambiguity we have when talking about "event" vs. "event message", however, we can tolerate such ambiguity, the second problem is in general the collection of such events may not be totally ordered, thus it does not conform with the definition of stream.

There are two possible solutions --- one to modify the definition of stream to be more general (I am not sure that the word stream inherently mean sequence, it is just happened to be the way it was implemented in a certain academic project), the other possibility is to invent a new name for the edge - like event pipe (or some other creative name). What do you think ?

I am also putting it as a comment on the glossary website.

I am taking this opportunity to suggest that anybody who wishes to make a terminology comment will do it soon, since we would like to close a first version in the next month or so.

More terminology issues - later.

Sunday, September 30, 2007

Event Processing - a footnote to databases ?

More in the spirit of the VLDB conference I've attended last week, there is a conception in the database community that event processing is really part of database technology, and that the functionality of event processing can be obtained using regular databases by inserting the events into the database, and asking "continuous queries" in the database. According to this outlook, the only reason that customers want to have engines outside the database engine is when some performance properties - typically - throughput and latency cannot be satisfied by database engines, but this can be handled by some tricks - like in-memory databases.


This reminds me that in the first conference I have ever organized: NGITS 1993 (we did not have conference webpages at those days) there was a discussion about the relations between Artificial Intelligence and Databases that followed the keynote address of John Mylopoulos, whom I always considered as one of the most visionary people I've ever met, John said something like this "the difference between AI and database discipline is that AI is a scientific discipline and database is an engineering discipline, which deals of efficiency issues", he, of course, made the database people who were present, quite angry, however, now that I am looking from the outside (at that time I have looked from the inside) on the way that database people think, I realize that he was, as usual, right.


While, high performance is one of the reasons that customers turn to COTS in this area, this is only the secondary reason, the main reason is that event processing software is being used is the level of abstraction they provide, and consequently the improvement in ROI. It seems also that the main competition between different products will be more in the ROI (ease of use) front, then in the performance front.


Event processing is different in the required functionality from database processing, the fact that database processing processes a state ("snapshot"), and event processing processes a set of transitions ("event cloud") impose different thinking, and hence different abstractions. Trying to introduce event pattern detection as extension to database processing (as we have seen in the EPTS meeting, the proposal being prepared now) have several attributes - simplicity is not part of them, and thus it totally misses the point of "ease of use", only to satisfy the assertion that event processing should be done within database processing. While these are nice academic attempts, and probably researchers will be able to write a lot of papers about the pattern extensions to SQL, I don't believe that they will catch in reality.


However - databases do have several roles in event processing, here are a few of them:

(1). Databases will be used to store events that should be used for retrospective processing. These database require to support temporal (or even spatio-temporal) characteristics; the database products don't provide yet good support of this area, and this deserves a separate blog.

(2). Databases (or in-memory databases) Will be used to store intermediate states for recoverability.

(3). Databases will be used to enrich events for processing (mainly reference data, but sometimes transaction data).
(4). Data warehouses will be used for embedded analytics.


I think that the database community should concentrate in enhancing database technology to support these functions in event processing -- e.g. temporal database support - both in abstraction level and efficiency in implementation, instead of insist on extend SQL in unnatural way.
I still need to discuss in more depth several topics like: temporal databases, retrospective processing, and alternative approach for SQL patterns, but will leave it for later.