Having built 2 products that use the Big Data “idea” and “tools”, I am qualified to add to the already existing chaos. Now, what triggers me to do this? Over the weekend, I was trying to explain one of those products to a couple of brilliant architects and thinkers. That’s when I experienced how it would be to be caught up in the belly of a Tsunami wave. Every word and every picture was ripped, rolled and mauled for the next 2 hours. I could quickly infer that the reason for the mayhem was that we did not have a common reference architecture of a typical Big Data solution in our minds. We even needed a totally new vocabulary that does not ring any old conflicting bells.
During the discussion, my brain was madly racing to bring some “order” on those areas. I will present it all here. You tell me if it brings more order or even more chaos. Basics first! Why do we need a reference architecture? It is basically to communicate a complex concept to others in the least conflicting manner. It is also to stack up the technical capabilities in such a way that things don’t collapse along the way. Once again, presenting a reference architecture itself is an art. If you are well exposed and experienced, you know that you need many perspectives to represent an architecture. A wise architect can be measured by his discipline in not mixing up multiple perspectives in one single artefact, unless its really essential.
Here is a “processing capabilities” perspective to any Big Data oriented solution. Is it earth shattering? No! But the point is, understanding my view will enable you attack my assumptions effectively. Lack of this will result in shattered egos :)
The vocabulary I used above are self explanatory except, probably, for the “Reduce” part. That is the only aspect that differentiates a big data solution from the traditional information management solutions. You will be reading multiple definitions for Big Data (the “4Vs definition” being the most common.) What is rare and in my opinion, what hits the nail on the head is that ‘Big Data is all about effective Data Reduction’. An implementation that does not result in a cost effective and rapid data reduction is not a proper big data implementation. And needless to say, the primary purpose of data reduction is to extract out and keep the “insights”.
The above can also be presented as a layered cake as below. Raw data gets ingested from the bottom. As it moves up the layers, it gets converted into insights and actions. Sounds cliche? :) Now, the “elapsed time” between the acquisition stage and action stage is a key factor. How rapidly do you want to take a business action on a newly arriving data? In Real Time? Near Real Time? Or when all the cows have come home? The technical capabilities that you need would be vastly different depending on that.
Comments
Post a Comment