Over the weekend I have revisited Tableau, enjoyed some success with MonetDB, tried to turn MySQL into a hundred million row data warehouse, been underwhelmed with Firebird, installed Greenplum and spent many frustrated hours with Talend Open Studio, Pentaho Kettle and Jitterbit.
Of course, I could just buy QlikView, but what can be done for less $money? Unfortunately data warehouses and BI front-ends are not sexy problems in the opensource community. Graphs and charts get a little more attention, but you’ll need to write your own code to glue them to your application.
In summary, what can I say about our options?
First, write your own ETL. Why do opensource ETL tools like Talend and Kettle work so hard to rebuild Informatica? It reminds me of Linux in the 1990s when the community wanted to beat Windows and kept working to look like Windows and wondering when victory would arrive. Informatica, like OLAP and mainframes, is from an era when memory was scarce; languages were low-level, slow to compile & run, abstracted little and were not at all portable. On top of that, ODBC drivers were tightly controlled and costly.
But now we can pick from many great scripting languages. Today’s languages abstract the hard parts, are easy to read, can be edited while executing and talk to any system, database, web service or application. I think the next direction for ETL will be a simple (but extensible) transformation language using an ORM wrapper… Rails on ETL. Until that arrives, you can achieve everything you need with PHP, Perl, Ruby and others.
Best option for low-cost data warehouse?

