Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian
From Mike Hoskins to Mike Richards (yes we can do that kind of leap in logic, it’s the weekend)…
Oh, Joel Miller, you just found the marble in the oatmeal! You’re a lucky, lucky, lucky little boy – because you know why? You get to drink from… the firehose! Okay, ready? Open wide! – Stanley Spadowski, UHF
Firehose of Terror
I think you get the picture – a potentially frightening picture for those unprepared to handle the torrent of data that is coming down the pipe. Unfortunately, for those who are unprepared, the disaster will not merely overwhelm them. Quite the contrary – I believe they will be consumed by irrelevancy.
If you’re still with me, let me explain.
I agree that the tap has been turned on, maybe not at the full blast of power or under maximum control, yet the data is coming and already well beyond a trickle. The helter-skelter implementations of big data solutions out there has, perhaps, created more of a turbulent blasting firehose than a meandering stream of flowing data. And this is only the beginning.
Success is Still Newsworthy
We are still at the stage where designing a (successful) enterprise system built around streaming data, for example, is big news. Why is it news? Because building is harder than merely planning, especially when new open source projects continue to push us beyond the bleeding edge. New tools are helping us see how we can handle more data and find more value, but they are also making the pool of data so much larger that the tools themselves are often irrelevant by the time they are adopted.
For example, MapReduce was awesome, until it began to be so widely adopted that its limitations became apparent. It’s like finding the marble in a sandbox filled with oatmeal – not easy, but when you find it, you’re a winner! Oh, the prize is a fierce typhoon of even more data coming your way. Congratulations! (Sorry you didn’t prepare for that.)
So where does this leave organisations that have no ability to handle more than a trickle of data?
It’s a win or lose scenario – either you can do something about it or you can’t. As software developers or data managers we won’t be judged along some smooth gradation of skills and capabilities.
We’ll be judged against a checklist
Yes or no.
Pass or fail.
Win or lose.
Firehose or … an icky pail to hide in the closet.
Data’s Need for Speed
Why is it a pass/fail scenario? Consider your car – is it successful when it mostly starts in the morning? Never. Anything beyond fully starting is a complete failure because that is what it is designed to do.
I argue that today’s data streams are being designed to handle data at maximum velocity. Sure many services aren’t producing millions of records per second, but as we gear up with the latest toolsets, we make the tools themselves hunger and thirst to get more and more data into their greedy little hands. Feed the beast – or ignore it at your peril.
Systems today are designed to run at full throttle – 100% – all out – maximum overdrive. None of us would like to have pay for premium broadband and find that we only use 10% of our bandwidth. Likewise, our systems are waiting for us to crank up the volume to see what we’ll do next.
Our data economy inherently wants to run at maximum but much of the old plumbing needs upgrades to function at that rate. You’d be irate if the fire department ran their hoses at 10% power when trying to douse your burning home. However, if they told you later that max pressure would burst the old hoses and you’d have no water, then you might be a little more appreciative.
Patch up those hoses so they are tested and ready. Buckle up the survival suit. Start digging through the oatmeal and open wide. Be forewarned – if you are not searching for the marble, you can never find it. If cannot find it, you’ll be sent home empty handed.
p.s. If you don’t win, I highly doubt you’ll even receive a lousy copy of the home edition.
- Learnings from TigerGraph and Expero webinar - April 1, 2020
- 4 Webinars This Week – GPU, 5G, graph analytics, cloud - March 30, 2020
- Diving into #NoSQL from the SQL Empire … - February 28, 2017
- VID: Solving Performance Problems on Hadoop - July 5, 2016
- Storing Zeppelin Notebooks in AWS S3 Buckets - June 7, 2016
- VirtualBox extension pack update on OS X - April 11, 2016
- Zeppelin Notebook Quick Start on OSX v0.5.6 - April 4, 2016
- Spark Analysis of Global Place Names (GeoNames) - January 20, 2016
- Serverspec checks settings on a Hadoop cluster - December 8, 2015
- Hadoop Options for SQL Databases - October 15, 2015