Friday, May 10, 2013

Event processing - small data vs. big data and the Sorites Paradox.

This picture is taken from a blog post from the "Big Data Journal" by Jim Kaskade entitled "Real-time Big Data or Small Data".  

Kaskade attempts to define quantitative metrics to what is "small data" vs. what is "big data".  
In terms of throughput big data is defined as >> 1K event per second, while small data is << 1K per second, I guess that around 1K event per second is defined as medium data...  
On variety big data is defined as at least 6 sources of structured events and at least 6 sources of unstructured events.  There are other dimensions like - small data relates to one function in the organization, while big data to several lines of business.     

The attempt to define where "big data" starts is interesting, the main issue is what are the conditions in which implementation of systems should become different, and here the borders are not that clear, since there are currently systems that can scale both up and down.

Interestingly -- "Big" and "Small" are fuzzy terms.  Which reminds me on one of the variations of the Sorites Paradox,  that I've came across during my Philosophy studies, many years ago, which goes roughly like this.

Claim:  Every heap of stones is a small heap.
Proof by mathenatical induction.
Base:  A heap of 1 stone is a small heap
Inductive step:  Take a small heap of K stones and add 1 stone, surely it will stay a small heap.



No comments: