The Hadoop Summit this week was a bang up success by any simple measure (like attendance or enthusiasm). The cornerstone announcement was the formation of Hortonworks by Yahoo and Benchmark Capital, spinning out key Yahoo Hadoop resources. Like all Hadoop commercial efforts there are some delightful twists (Cloudera reasonable suggested some clarifications below):
- Like Cloudera, Hortonworks will offer Hadoop support services.
- Unlike Cloudera, Hortonworks will contribute all their developed code back into the Apache Hadoop distribution project (Cloudera has their own distribution; Cloudera contributes all Apache Hadoop changes back to Apache and develops proprietary management modules as well ).
- Very unlike Cloudera, Hortonworks will specifically invest in the management capabilities that is the "secret sauce" by which Cloudera differentiates the not-free Cloudera Enterprise Edition and contribute these additions back to the Apache project.
- For Cloudera, Hortonworks represents incremental validation of the market (good news). But Hortonworks sure smells like a serious competitor to me.
For me the critical business question is how to make money with Hadoop (clearly as a free tool Apache Hadoop provides great value). Since most of the Hadoop dialog is the enthusiasm, I've always felt like a cranky voice and wet blanket,even though how to make money is something everyone in the community should be concerned about because ultimately it is that revenue stream that will really define what Hadoop becomes. So I was just delighted by IBM's Anant Jhingran's keynote (CTO, Information Management) because Anant spoke directly to exactly these issues. I strongly encourage everyone who wasn't there to watch when the video goes up.
Anant painted what to me is a remarkably insightful view of Hadoop and how it fits into the existing world of analytics, and from that perspective how to better understand the business issues. IBM clearly has an important analytics business already, so it's not surprising that Anant doesn't say that SQL is dead and that Hadoop and nonSQL will take over the world. So you could cynically say it's just legacy company FUD. I think not: I think it's the wisdom of someone with a lot of experience in the business.
Some of his key points (to me) include these:
- Don't get hung up on "big". Hadoop was key to IBM's Watson solution but after the set-up the core database was just 6 TB which isn't that big ("volume" isn't everything).
- Remember that most analytic solutions use a diversity of data ("variety is important").
- Remember that for some solutions the ability to ingest rapid streams of data is an important element ("velocity"). He showed a slide in which IBM had ranked various important analytics solution are decomposed by the degree to which Variety, Volume and Velocity are important. Anant didn't take the next step and describe the role Hadoop should and can plan, but it certainly seems like a valuable framework for considering that problem.
- He colorfully described the Hadoop community as consisting of "birthers" (those on the leading edge, pushing the boundary with Hadoop) and "adopters" (the larger BI community that would like to integrate Hadoop into their existing and valuable solutions). He reasoned that these quite different groups needed to find a way of working together and working to the broad common good (rather than splintering into factions).
- Anant also pointed out the hidden obvious truth (the ignored elephant in the tent if you forgive the egregious pun): it's really hard to make money from Open Source.
Like many said in one way or another, 2011 is clearly a watershed time for Hadoop and Big Data. A lot of the complexity in the paths to be taken and the decision to be made surfaced at the Hadoop Summit. Now to just consult my crystal ball....
