Saturday, March 13, 2010

On events versus data

The word "data" always reminds me of the android from Star Trek The Next Generation whose name was data. The word data (in computing) typically is very general and refers to anything the is represented on digital media, the picture of data above is also a piece of data, like many other things. The word "event" also has a broad term which means something that happened.

Recently Paul Vincent wondered in his Blog about the difference between event and data, as some people think that events are footnotes to data. Since by the definitions above, obviously event and data are not really the same, I'll try to talk about the touch points among them, since those are the reason of misconceptions.

There are various touch points between events and data:

  1. Event representation contains data. Event is represented in the computing domain by "event object" or "event message" which usually is also is called "event" as a short name. This event representation includes some information about -- what is the event type, where it happened, when it happened, what happened, who were the players etc... Example: the event is "enter to the building", the event's payload contains information that answer questions such as: what building? who entered? when ? and maybe more. The payload of the event is data, it may be stored (see event store), or just pass by the system.
  2. Data store can store historical events. Event representations can be accumulated and stored in a data store, for further usage. There are large data stores that collect weather events. Note that in order to navigate in historical events, these events may be stored in a temporal database an area that I've dealt with in the past, sometimes if the events are spatial then it have to be stored in spatiotemporal database.
  3. Database can be event producer. In active databases the event were database operations; insert, modify, delete and retrieve, in this case the fact that some data-element has been updated or accessed is the "something that happens" (which may or may not reflect something that happens in reality), and the database acts as event producer and emits event for processing by an event processing network. Note that actually all event producer contains some data that is turned into event, for example transaction instrumentation like what IBM has done in CICS as event producer.
  4. Derived events as database updates. An event processing application take events from somewhere as input, does something, and creates derived events, and send them somewhere, this is all event processing is in one sentence, a derived event created in this process may go to an event consumer, the event consumer may be a DBMS or another type of consumer whose action is to update some data store.
  5. Event enrichment by data during the event processing. During the event processing operations, sometimes enrichments of events is requested, let's return to the event of a person enters a building, the event processing application deals with security access control, and needs to know what is the person security clearance, this information is not provided with the event which provides only identification of the person, and there need to be some enrichment process in which an enrichment event processing agent accesses some global store, in this case reference data, to extract the clearance value and put it inside the event for further processing.
Thus the main issue is not the "versus" issue but the various relationships between the two terms.

Thursday, March 11, 2010

On automatic translation

A known urban legend about automatic translation is that an automatic translation program got as an input the phrase "the spirit is willing but the flesh is week" and translated it from English to Russian and then back to English, the end result was "the vodka is good but the steak is lousy", there are some translation pearls collected all over the Web. I am using automatic translations from time to time, mainly since my good friend Rainer von Ammon has a habit of forwarding me Emails and documents in German, the automatic translation programs I can find on the web are not that good, but I can understand more or less what is written. However, last night I had my moment of loud laughing. While searching the Web for something using the almighty Google search, I came across a webpage written in Hebrew, I realize that most of the Blog readers don't read Hebrew so I'll summarize the reading experience: first -- it looks like a collection of words in the wrong order and syntax that does not make any sense, second --- looking closer I realized that I actually wrote it, well - it is not that I forgot how to write in Hebrew, on the contrary, my Hebrew is still much better than my English, but it seems that it is supposed to be a translation to Hebrew of a Blog posting I have written in English in January 2009. Trying to get to the bottom of it, I've found that there is a site called the "Unix and Linux form" which copied some of my Blog posting (not sure in what context) using some crawler that is called "Linux Bot", it seems that it did not just copy it, but also translated it to Hebrew. Since Hebrew is not the most popular language in the universe, I wonder to how many other languages it is translated, and if somebody is making any quality control. Funny.

Wednesday, March 10, 2010

Revisiting race condition with FFD example

In the past I have written about race conditions and this triggered some responses. We recently realized that in the example we created for the EPIA book (the Fast Flower Delivery and has got already around ten different implementations, six of them can be viewed on the book's webpage, some more will be added) there is an case that if will not be handled carefully may yield wrong results due to race conditions. Here is the case:

There is an aggregate EPA per driver and day that collects assignment events for a driver and in the end of the day creates a derived event which counts the number of assignment per driver, there is a second EPA per day that collects all the drivers count at that day and calculates mean and standard deviation for the number of assignments per active drivers in that day; there is a third EPA, again per driver and day, which gets the derived events from the first two EPAs and calculate for each driver its deviation from the mean, in standard deviation units. These three EPAs are all aggregation type EPA which has some order among them, until now -- no problem. Now, the issue is that all these calculations occur at the end of the day, and have causal dependencies. If we are not careful, the first EPA calculates the count per driver at the end of the day, but until it finishes the calculation the time is say, 12:01, so the result is classified to the next day, but it is required to calculate the statistics for this day, and then if it gets into the statistics of the next day, then we get some inconsistency in the system. Obviously a naive implementation will get wrong results here. There are various ways to handle it and ensure correctness, however the main issue is whether the developer needs to be aware of it while designing the application, or the compiler that takes the definition of these EPAs and creates the actual implementation should be the one which will do the job. My opinion is that if the developer will have to take care of such things in hard coding, the life will be quite difficult, as this is only one case of race condition, and it is better that it will be transparent to the developer. This will eat the cake and have it too --- both using high level tool that makes the programming easier and lower the total cost of ownership, and fine tune the semantics in a way that require typically dedicated, and even complicated programming. More about other aspects of semantic fine tuning - later.

Tuesday, March 9, 2010

On new event processing course and some ways to explain what event processing is

This is the Computer Science building at the Technion; yesterday I have started to teach an event processing course there (in the previous semester I have taught in the Information Systems Engineering program, which is my academic home for many years), the course is given as an "advanced topics in computer science" course, and the students that have shown up were mostly graduate students from CS as well of EE (there are software people in EE also). Unlike the previous semester in which the students projects evolved around validating the solutions that different vendors put for the EPIA book's website, this time I would like to concentrate in the project level about the view of those building event processing platform (this is a slightly different view). The first class is always an introduction, since the students don't have a clue about what event processing is, thus I am starting with some examples about what is it used for.
Always there is somebody who asks the question -- what is new, haven't such applications done in the past also? My answer is, consistent with the answer I am giving to this question for years, event processing does not bring any new functionality to the table, it takes functions that people have done in one or other way using regular programming and make it both -- easier to use since it introduces high level abstractions (analog to the abstraction of database query against the way we did database programming in the distant past, and other abstractions), and providing platforms that can execute these abstractions using optimizations specific to these abstractions.
I am also talking about the event-driven decoupling programming vs. the more traditional request/response.

One thing that I am making sure that the students will understand is that (like databases) event processing can be used for many different purposes and is not bound to a single type of application, or single industry. I am doing it by starting with 10 examples that are quite distinct (taken from chapter 1 of the EPIA book), and concluding by trying to generalize some type of applications, using the picture below (taken from an IBM academy study a few years ago that examined what event processing applications are), this is just one of possible classifications.
This is an important thing, since there are some misconceptions around.
As I have written in a previous posting, event processing does not have a single purpose.
Creating event aggregations to check key performance indicators is certainly an application, but not the only one; likewise "detecting threats and opportunities in the event cloud" is a phrase going around (I heard it first from Roy Schulte, but I don't know if he is the copyrighter), and indeed, this is an application of event processing, but many event processing application detect neither threats or opportunities, but serve for diagnosis, or operational decisions. The message - event processing is a discipline with set of concepts, and various ways to implement these concepts that serve many purposes that need event-based computing. I Shall write more about the course as it will further develop.