What if you could turn on a massively parallel business intelligence database cluster with a few lines of code? What if you could leverage in-house and outsourced resources for computation and storage as needed? What if you could expand your analysis, data mining and text-search effort one node at a time, transparently, instantly?
There’s been a flurry of discussion around Hadoop and the Hbase project to bring Google’s BigTable feature to Hadoop.
Now Amazon wants to talk about how to use Hadoop with EC2 and S3, their computing and storage clusters.
Can I search large volumes of data on the cheap? Yes, but my algorithms must fit within the MapReduce framework.
Does someone have a MapReduce-enabled data query language? Well, there’s Pig from Yahoo. Sawzall from Google. Here is a discussion comparing those two from Greg Linden. Abacus from the Hadoop project. Apparently Microsoft has DryadLINQ.
We are on the exponential curve as it swoops upward dramatically. From the power and flexibility of opensource, anyone can use Google secret sauce on Amazon’s computers for 18 cents per gigabyte and 10 cents per computing hour.
