Tag Archives: emerging technology

Low-Cost Data Analysis & Visualization: It’s Getting Better All The Time

Over the weekend I have revisited Tableau, enjoyed some success with MonetDB, tried to turn MySQL into a hundred million row data warehouse, been underwhelmed with Firebird, installed Greenplum and spent many frustrated hours with Talend Open Studio, Pentaho Kettle and Jitterbit.

Of course, I could just buy QlikView, but what can be done for less $money? Unfortunately data warehouses and BI front-ends are not sexy problems in the opensource community. Graphs and charts get a little more attention, but you’ll need to write your own code to glue them to your application.

In summary, what can I say about our options?

First, write your own ETL. Why do opensource ETL tools like Talend and Kettle work so hard to rebuild Informatica? It reminds me of Linux in the 1990s when the community wanted to beat Windows and kept working to look like Windows and wondering when victory would arrive. Informatica, like OLAP and mainframes, is from an era when memory was scarce; languages were low-level, slow to compile & run, abstracted little and were not at all portable. On top of that, ODBC drivers were tightly controlled and costly.

But now we can pick from many great scripting languages. Today’s languages abstract the hard parts, are easy to read, can be edited while executing and talk to any system, database, web service or application. I think the next direction for ETL will be a simple (but extensible) transformation language using an ORM wrapper… Rails on ETL. Until that arrives, you can achieve everything you need with PHP, Perl, Ruby and others.

Best option for low-cost data warehouse?

Continue reading

ETech 2008

Two talks stuck out today. One was a brief talk from Saul Griffith of Makani Power. What I took away from the talk was the impracticality of many renewable energy sources such as tidal power and wind power. A combination of all renewable options, with an emphasis on solar and a strong conservation element will be required to simply maintain our current consumption, let alone address growing desires from China and other developing countries.

Another standout talk that actually shared a bent on energy consumption, was by Stan Williams of HP Labs. His lab is researching several pieces of the exabyte and zetabyte computing puzzle. To paraphrase Stan, “We can’t use spinning disks in zetabyte computing because we’ll start torquing the earth” and that “using a zetabyte computer to model the earth’s climate will have to include the computer itself as it will be the single largest actor on the earth’s climate.” The tie-in with Saul’s talk is that a zetabyte computer made of modern components would require 3 terawatts of electricity, which I immediately remembered as the total energy that could be harnessed from tidal power before bringing the oceans to a standstill. Of course, we would also have to include that in the climate model.

The Landscape of Parallel Computing Research

I highly recommend browsing the Wiki that has been put up at UC Berkeley’s EECS dept to complement it’s original white paper (also excellent). It’s hard to predict the future applications of massively parallel systems, but we know it will be a giant evolutionary leap. They see multi-core as 2-32 processors and many-core as hundreds and thousands of cores. Intel is already experimenting with an 80-core system that they pledge will be shippable in 5 years. 64-bit quad-core is already available for about $300. I’m still amazed by that much power for so little cash.