Tag Archives: Tableau

Low-Cost Data Analysis & Visualization: It’s Getting Better All The Time

Over the weekend I have revisited Tableau, enjoyed some success with MonetDB, tried to turn MySQL into a hundred million row data warehouse, been underwhelmed with Firebird, installed Greenplum and spent many frustrated hours with Talend Open Studio, Pentaho Kettle and Jitterbit.

Of course, I could just buy QlikView, but what can be done for less $money? Unfortunately data warehouses and BI front-ends are not sexy problems in the opensource community. Graphs and charts get a little more attention, but you’ll need to write your own code to glue them to your application.

In summary, what can I say about our options?

First, write your own ETL. Why do opensource ETL tools like Talend and Kettle work so hard to rebuild Informatica? It reminds me of Linux in the 1990s when the community wanted to beat Windows and kept working to look like Windows and wondering when victory would arrive. Informatica, like OLAP and mainframes, is from an era when memory was scarce; languages were low-level, slow to compile & run, abstracted little and were not at all portable. On top of that, ODBC drivers were tightly controlled and costly.

But now we can pick from many great scripting languages. Today’s languages abstract the hard parts, are easy to read, can be edited while executing and talk to any system, database, web service or application. I think the next direction for ETL will be a simple (but extensible) transformation language using an ORM wrapper… Rails on ETL. Until that arrives, you can achieve everything you need with PHP, Perl, Ruby and others.

Best option for low-cost data warehouse?

Continue reading

Interactive Information Visualization

Enrico Bertini at Visuale asks how important is interactivity in information visualization? As a proponent of QlikView, Spotfire, Tableau and others, I think it’s extremely important. Interactivity is the future, it’s “make or break.”

I’ve been implementing speed-of-thought interactive BI tools for 6 years and I don’t want to do it any other way. When I watched my first seasoned executive lose restraint and laugh uncontrollably as he got instant answers to his hardest questions, I knew this was the only way to go. When my end-user training sessions end late because everyone is so excited about what they can do, it’s clear that people NEED interactivity.

Response to the Tableau 3.0 Webinar

I finally got around to watching the Tableau 3.0 webinar. I agree with their very excited presenter that Tableau 3.0 is a leap forward. The support of ad-hoc grouping of dimension elements is excellent as is the enhanced support of ad-hoc sets. The annotations look good and act sensibly. Generally, the new features are focused on ease of use, better statistical analysis, and report clarity. All good things. Here are 3.0 examples.

Annotations should be required in every BI tool. The ability to mark reference lines and data points on graphs and tables is critical to clear communication. Placing an annotation on a point in space does not require a data point to exist there, another nice feature. The smart BI vendors are focusing on collaboration and communication among users.

“Groups” stole their name from the “groups” of 2.x which are now the “sets” of 3.0 and can be used like so: similar dimensions such as coffee and tea, which may need to be represented in the database as separate product lines, can now be combined on the fly within Tableau by an end user under the simple heading “drinks”. This would make it easy to answer a question about food vs drink sales without the need to export to Excel and spend more time adding up the drink categories. In short, “groups” bring dimension values together and “sets” allow for separating special values from the rest of a dimensions values–and both can be done by the end user. Pretty nice.

I think the strongest competitor for visualization is Spotfire. However, Tableau’s use of live database interaction will become an advantage as data warehouse implementations shift to high-performance in-memory read-optimized databases. Was that over-hyphenated? Spotfire’s initial data loads are inflexible and I wouldn’t recommend it if you need to update a large dataset frequently.

Unlike QlikView, all of Tableau’s data needs to be in a single database. With good design, this is not a performance issue. The problem is that the extra expense of hardware and software to store a separate data warehouse and run ETL processing may push Tableau’s final price tag far above QlikView, which can easy pull from multiple sources and uses its own high-speed database.