Why SQL Matters, the Limits of Open Source, and Other Lessons of EMC Greenplum's Pivotal HD
On the eve of another Strata week, EMC Greenplum announced Pivotal HD, a Hadoop distribution that is tightly integrated with an MPP database engine called HAWQ. What all this alphabet soup means is the following: EMC has essentially done a heart transplant from Greenplum of everything needed to run scalable SQL queries and planted it on top of the Hadoop File System. This means that you can now store massive amounts of data in Hadoop, massage it into structured form with MapReduce, and then access it using fully compliant SQL, with screaming performance. In practical terms, Pivotal HD means that one Hadoop distribution can provide an infrastructure for both structured and unstructured information, batch and real-time queries. This announcement also shows EMC Greenplum is focus is on the needs of enterprise buyers, not on Silicon Valley fashion. It is likely that this approach will win two sorts of deals in the short term: Companies who have experimented with Hadoop and have had early success but are weary of the bottleneck that MapReduce programming presents to exploit data. These customers will say, “I love the fact that I can get insights out of Hadoop, but we’ve spent twenty years figuring out how to gain value from structured information using SQL. I don’t want to have to give that up to use Hadoop.” Pivotal HD is a good fit there and will replace many pilots done with other distributions. MapReduce programming will still be required, of course, but SQL support makes the results rapidly accessible to many more people. Companies who find their current data warehouse creaking under increased load. Their current vendor will no doubt suggest a path forward, but one that probably doesn’t tightly integrate with Hadoop. If, and this is a big if, the ETL processes are not tightly bound to the current data warehouse, then Pivotal HD will allow those customers to handle larger data warehouse workloads much faster than current technology. Because Pivotal HD supports SQL completely, this sort of replacement is possible. Then, the company can start its experimentation with Hadoop, knowing that its investment is SQL can be preserved. But there are other lessons in this announcement that CIOs and CTOs seeking to make Hadoop work in their businesses would do well to pay attention to.