Skip to main content

An ounce more to the Big Data chaos !

Having built 2 products that use the Big Data “idea” and “tools”, I am qualified to add to the already existing chaos. Now, what triggers me to do this? Over the weekend, I was trying to explain one of those products to a couple of brilliant architects and thinkers. That’s when I experienced how it would be to be caught up in the belly of a Tsunami wave. Every word and every picture was ripped, rolled and mauled for the next 2 hours. I could quickly infer that the reason for the mayhem was that we did not have a common reference architecture of a typical Big Data solution in our minds. We even needed a totally new vocabulary that does not ring any old conflicting bells. 

During the discussion, my brain was madly racing to bring some “order” on those areas. I will present it all here. You tell me if it brings more order or even more chaos. Basics first! Why do we need a reference architecture? It is basically to communicate a complex concept to others in the least conflicting manner. It is also to stack up the technical capabilities in such a way that things don’t collapse along the way. Once again, presenting a reference architecture itself is an art. If you are well exposed and experienced, you know that you need many perspectives to represent an architecture. A wise architect can be measured by his discipline in not mixing up multiple perspectives in one single artefact, unless its really essential. 

Here is a “processing capabilities” perspective to any Big Data oriented solution. Is it earth shattering? No! But the point is, understanding my view will enable you attack my assumptions effectively. Lack of this will result in shattered egos :)
The vocabulary I used above are self explanatory except, probably, for the “Reduce” part. That is the only aspect that differentiates a big data solution from the traditional information management solutions. You will be reading multiple definitions for Big Data (the “4Vs definition” being the most common.) What is rare and in my opinion, what hits the nail on the head is that ‘Big Data is all about effective Data Reduction’. An implementation that does not result in a cost effective and rapid data reduction is not a proper big data implementation. And needless to say, the primary purpose of data reduction is to extract out and keep the “insights”. 

The above can also be presented as a layered cake as below. Raw data gets ingested from the bottom. As it moves up the layers, it gets converted into insights and actions. Sounds cliche? :) Now, the “elapsed time” between the acquisition stage and action stage is a key factor. How rapidly do you want to take a business action on a newly arriving data? In Real Time? Near Real Time? Or when all the cows have come home? The technical capabilities that you need would be vastly different depending on that.



Comments

Popular posts from this blog

A Date - Part 1

“Good Morning. How is everything? Did you manage to have an early breakfast date with yourself?” I was on the bed in a hotel room in Jakarta when that SMS woke me up. I only managed to get a very few hours of sleep that night. Even in that, my mind went churning non-stop on some meaningless, unconnected things. I was literally aware of all that turmoil in the half sleep, twisting and tossing through it. Back to the SMS. All the SMS I get usually are with a single consistent purpose - my colleagues trying to find my whereabouts. Note - none of them is a hot babe. (This is altogether a different topic - how come no hot chick in this whole damn world manages to become an IT architect!!) Oh, for a change, I do occasionally get some irritating marketing campaigns. So, it is not a wonder that this message sprang me awake like a Maasai Warrior. (Maasai are an African tribe famous for their ability to rise from the deepest sleep to a state of total combat readiness in a matter of seconds)...

Birthdays and Facebook

I read about a guy who had 400+ friends in Facebook. One day he woke up as a curious little wanker and wondered if he is really that sociable. So he organized a party and sent invitations to all his contacts over Facebook. 50% confirmed and another 20% were tentative. He was delighted - that’s one hell of a response, actually. The real day came. Our man waited at the venue, which happened to be a popular joint, but no single soul turned up. An hour later, one woman came but she also left in the next 30 minutes. He had 400 friends and yet he ended up drinking alone that night. So the question to ponder is how close are our virtual worlds to the reality? . But my case was quite not as somber. Yesterday was my birthday. Facebook reminded of it to a whole lot of my gang. Some of them have never wished me in decades and some are new. How exciting! So here is my big thanks to all my dear friends for the wishes over calls, sms and facebook. You made me feel special. Special thanks to cutie p...

Sift Audience Data Query Language (DQL) – 1

I had been thinking long and hard on enhancing the Audience Management capability in Sift. Everybody seems to have it - from the mighty big players to the obscure ones. In such a crowded space, it is of prime importance to have a very clear and a striking "unique selling proposition" before you boast that you have this capability too. Basically, the tool shall solve a problem that was very challenging previously, in a cost effective way that was not thought of commonly. Now, Sift is a real time analytics engine. So obviously, the profiles it builds about individual entities are accurate - up to the second. That gets a lot of brownies to start with. And as Sift would limitlessly process data, the data points it can compute for a single entity would also be much wider in range. This is great too. But, 1. What if I add "relationships between the entities" into this mix? For example, retrieving " all customers who just visited a particular online storefront...