Thursday, April 18, 2013

Progress Apama announces a version which compiles to native machine code

Progress Software announced today on the release of a new version that compiles the Apama EPL into native machine code, claiming to improve the performance of the previous version by 2000%.   They don't mention what they actually measure.   The big data era renews the investment in scalable event processing solutions with various ways of optimizations.   We may start to see specialized event processing hardware. 
I think that it will be useful to establish a set of benchmarks, since it was seen in some works that there are huge differences in performance between types of event processing application - for example: those doing mainly filtering, those doing mainly aggregations, and those doing pattern matching.  It will be good to have a set of benchmarks that fit different types of applications, and a method to map application characteristics to a specific benchmark - to avoid the phenomenon that vendors cite numbers that cannot be compared.  More -later. 

Wednesday, April 17, 2013

DEBS 2013 - keynotes and tutorials were published


DEBS 2013 will take place in the campus of the University of Texas Arlington, in June 29- July 3.
Today the keynotes and tutorials were published on the conference's website:

The keynote speakers this time will be: Roger Barga from Microsoft, one of the first persons in Microsoft Research who worked on event processing, and crossed the line to the product organization and deals now with product management.  Roger will talk about "the rise of the velocity pipeline in enterprise computing" that will focus on the velocity part of big data that makes batch solutions like Hadoop as inadequate.
The second keynote speaker is David Wollman who manage the smart grids standard activities in  NIST  will talk about smart grids.

There will be four tutorials - all of them of known people active in the area.
The first one will be given by my IBM colleagues from the System S team who will talk about "stream processing optimizations".  
The second one will be given by Christoph Emmersberger  an Florian Springer (both of them I know from their past association with Rainer von Ammon), who will talk about the event processing capabilities of Apache Camel.
The third one will also be given in German accent, by Boris Koldehofe and  Frank Dürr from University of Stuttgart. They will talk about "Software defined networks".
Last -- keeping the tradition, I'll be giving a tutorial this year also, this time together with Jeff Adkins, on a topic that we are both dealing - "why is event driven thinking different from regular thinking about computing". I'll write about this tutorial at a later phase (well, we have to prepare it first), meanwhile you can read the short abstract on the site.  Hope to meet old friends and colleagues in Arlington.   
More - later.

Tuesday, April 16, 2013

On the right technology for decisions

I came across a (not new) discussion by my IBM colleague Jean Francois Puget  published on the IBM developerworks entitled  "What is the difference between SPSS and ILOG".  Actually, while Jean Francois colors it in blue and discusses it within specific IBM products, he says that he is more interested to discuss the generic question about what is the right decision technology, as there are various technologies today that are labelled as decision technologies, decision management and other kind of decision oriented names. 
Jean Francois makes the distinction between  "single decision at a time" and "doing a group of decisions together" and asserts that for "single decision at a time" BRMS and/or predictive analytics is the right kind of technology, while for "doing a group of decisions together" optimization techniques are appropriate.  
This has some truth to it,  but I am not sure that it is the ultimate differentiation between these two types, so let's look at this issue.   When there is a need to do a decision, there are several approaches:
  1. Get a person all relevant data and let this person do the decision
  2. Make automated decision (or recommendation)- when the way to do the decision can be  codified as decision trees/decision tables/rules
  3. Make automated decision (or recommendation) --  when the decision needs to find the best alternative according to a quantified criteria. 

For each of these cases, the data obtained can be deterministic or stochastic, existing or predicted, and there are various ways to achieve this type of data, but this is true regardless of the three cases.    It seems that approach 1 does not require any decision technology - although some people call requested data that uses some kind of inference technique also a decision technology, but I think that it might be taking the term decision to non-intuitive place;  approach 2 requires some kind of rule technology, and approach 3 requires some kind of optimization technology. 

Now,  there are cases in which single decision requires optimization. For example, a person wins the lottery and needs to get a decision where to invest the money. This is a single decision with a lot of alternatives, it requires also predictions on these alternatives,  and the person has some objective function and constraints on types of investments.   There are cases in which there are multiple decisions that have to be done at the same time -- for example:  who should receive bonus, however the amount of bonus recipients is fixed, and the criteria are very simple, so no optimization should be done, just a lot of rules applied to all candidates to rank each of them, and then sort by the ranking.  So there is selection between alternatives, but optimization is not really required.   While there is some correlation between the criteria specified by Jean Francois and what I have written here, but it seems to me that the main distinction is what is the kind of decision, and the way alternatives are compared...   More on this - later.