… and points beyond

mostly about data

Browsing Posts in visualization

Over the weekend I have revisited Tableau, enjoyed some success with MonetDB, tried to turn MySQL into a hundred million row data warehouse, been underwhelmed with Firebird, installed Greenplum and spent many frustrated hours with Talend Open Studio, Pentaho Kettle and Jitterbit.

Of course, I could just buy QlikView, but what can be done for less $money? Unfortunately data warehouses and BI front-ends are not sexy problems in the opensource community. Graphs and charts get a little more attention, but you’ll need to write your own code to glue them to your application.

In summary, what can I say about our options?

First, write your own ETL. Why do opensource ETL tools like Talend and Kettle work so hard to rebuild Informatica? It reminds me of Linux in the 1990s when the community wanted to beat Windows and kept working to look like Windows and wondering when victory would arrive. Informatica, like OLAP and mainframes, is from an era when memory was scarce; languages were low-level, slow to compile & run, abstracted little and were not at all portable. On top of that, ODBC drivers were tightly controlled and costly.

But now we can pick from many great scripting languages. Today’s languages abstract the hard parts, are easy to read, can be edited while executing and talk to any system, database, web service or application. I think the next direction for ETL will be a simple (but extensible) transformation language using an ORM wrapper… Rails on ETL. Until that arrives, you can achieve everything you need with PHP, Perl, Ruby and others.

Best option for low-cost data warehouse?

continue reading…

What settings do you use for the gauge and bar charts? Watch the video!

Stephen Few, who spoke at the QlikView conference in April, devised the bullet graph a few years ago. A QlikView customer used bullet graphs and sparklines and was very generous to allow QlikTech to post a working demo of their application. I’m going to build the bullet graphs from that app. You can download and dissect a QVW copy of that app from the QlikView demo website.

Bullet Graph Demo App Example

The bullet graph in QlikView is a bar chart overlayed on a gauge chart. The demo app uses a technique of aligning the targets on all the graphs to 100% of current year budget. The formula for the black line is current year actuals divided by current year budget. The darker gauge section shows prior year actuals over current year budget.

Bullet Graph Diagram

This technique has several advantages:

  • Because PY and CY actuals are both divided by CY budget, they are still in harmony. You can visually see that current year sales is significantly less than last year’s sales.
  • Actual divided by Budget unifies many measures with wildly different scales, making chart maintenance easier without hurting accuracy.
  • Without this technique, you would need to write expressions for the gauge chart expression, and maximum values for bar and gauge charts. With this technique, they are 1 and 1.5.
  • There is additional context in answering the question, “If we were repeating last year’s performance, would we be beating our budget, and by how much?”

I hope you’ve enjoyed this tour through bullet graphs. Take a look at the demo app for sparklines, which I plan to profile as well.

What settings do you use for the gauge and bar charts? Watch the video!

Stephen Few in his blog recently highlighted bullet graphs.

Bullet Graph

I wanted to point out a QlikView demo application that uses this technique alongside Tufte’s sparklines. You can download and play with this application or examine the AJAX version.

Bullet Graphs in QlikView

Every day, the Economist site adds an original chart.

I just can’t get enough of the Gapminder software, officially called Trendalyzer. The easy interface is just an enhanced scatter plot but it does the job perfectly. I think a lot of people, like me, were amazed and excited by his talk at TED 2006. He demonstrated that analytics offers real insight and improves efficiency. Now there are more “Gapcasts” to enjoy.

Enrico Bertini at Visuale asks how important is interactivity in information visualization? As a proponent of QlikView, Spotfire, Tableau and others, I think it’s extremely important. Interactivity is the future, it’s “make or break.”

I’ve been implementing speed-of-thought interactive BI tools for 6 years and I don’t want to do it any other way. When I watched my first seasoned executive lose restraint and laugh uncontrollably as he got instant answers to his hardest questions, I knew this was the only way to go. When my end-user training sessions end late because everyone is so excited about what they can do, it’s clear that people NEED interactivity.

I finally got around to watching the Tableau 3.0 webinar. I agree with their very excited presenter that Tableau 3.0 is a leap forward. The support of ad-hoc grouping of dimension elements is excellent as is the enhanced support of ad-hoc sets. The annotations look good and act sensibly. Generally, the new features are focused on ease of use, better statistical analysis, and report clarity. All good things. Here are 3.0 examples.

Annotations should be required in every BI tool. The ability to mark reference lines and data points on graphs and tables is critical to clear communication. Placing an annotation on a point in space does not require a data point to exist there, another nice feature. The smart BI vendors are focusing on collaboration and communication among users.

“Groups” stole their name from the “groups” of 2.x which are now the “sets” of 3.0 and can be used like so: similar dimensions such as coffee and tea, which may need to be represented in the database as separate product lines, can now be combined on the fly within Tableau by an end user under the simple heading “drinks”. This would make it easy to answer a question about food vs drink sales without the need to export to Excel and spend more time adding up the drink categories. In short, “groups” bring dimension values together and “sets” allow for separating special values from the rest of a dimensions values–and both can be done by the end user. Pretty nice.

I think the strongest competitor for visualization is Spotfire. However, Tableau’s use of live database interaction will become an advantage as data warehouse implementations shift to high-performance in-memory read-optimized databases. Was that over-hyphenated? Spotfire’s initial data loads are inflexible and I wouldn’t recommend it if you need to update a large dataset frequently.

Unlike QlikView, all of Tableau’s data needs to be in a single database. With good design, this is not a performance issue. The problem is that the extra expense of hardware and software to store a separate data warehouse and run ETL processing may push Tableau’s final price tag far above QlikView, which can easy pull from multiple sources and uses its own high-speed database.

From the Björk shows comes the reactable. The demo is wonderful. Via Boing Boing.

snag-0003.png

What and outstanding site! Found on Kevin Kelly’s Cool Tools blog.

snag-0001.png

The Inner Life of the Cell is a mind-expanding animation. Watch the incessant, precise, complex and beautiful activity inside a single white blood cell. Why can’t more educational materials be like this? Here’s a neat writeup about it in Wired. Created by John Liebler. Here’s another version with music instead of narration.

snag-0002.png