Wednesday, May 11, 2011

On big events and big data



The "Big Data" phenomenon gains a lot of traction, interest, and related work in recent years.   The Internet and making everything in digital form has resulted in amounts of data beyond past imagination, and the rate of growth is amazing.   Mark Palmer in his Blog posting made the analog of data as sand, 



saying that  "If every grain of sand in the bucket was 1 byte of data, then:
  • The entire work of Shakespeare fills just one bucket of sand (about 5MB)
  • A fast financial market data feed (OPRA) fills a beach of sand in 24 hours (about 5TB) 
  • Google processes all the sand in the world every week (about 100PB)
  • We generate 60% more sand every year" 


Using this analogy - if all data in the world is a sand,  much of the sand is talking about facts, BTW - the fact that a fact appears as a data in the big data universe, does not say that this fact is in fact true.  

Events issue some of this data, but in many cases an event is the fact that a fact becomes true or false, and this fact is not really kept in the data.   

The "Dagstuhl grand challenge",  which is part of the event processing manifesto, is talking about an "event fabric", which will be the Internet equivalent of events instead of data, I guess that the quantities will be on the same cardinality, thus   it will have the same scalability challenge. The main difference is the type of processing -   event processing instead of queries/information retrieval.    Getting to an "event fabric" has indeed many challenges.  In DEBS 2011 there will be a tutorial about this grand challenge.   I'll write more about this challenge in the future. 


(and this is of course Schloss Dagstuhl) 

No comments: