Can a humble Chart object get some love?

If there was just one question I could ask at this year’s Qonnections 2011, it would be this…

When are we going to see improvements to the most basic QlikView task: displaying data?

Look at the following examples from competitors…

Above is a Spotfire chart that cleanly displays a 2-level hierarchy of dimension values on the x-axis. Increase to 3 levels and the labels stay organized and readable.

Below is a chart from Tableau.

The axis labels are only shown at the left and the bottom of the entire trellis. QlikView shows axis labels on each square, adding unnecessary clutter that is not easy to remove. Two dimension values are coded in the size of the dots and their color. Tableau also uses color gradients easily and effectively.

Tableau and Spotfire put a lot of energy into making displays clean and readable. Tableau makes excellent guesses at how to display your data.

QlikView’s charts have felt clunky for years. The Chart building dialog is huge, confusing and too often doesn’t work as expected. Charts don’t adapt well to being small. Axis labels cram into each other, don’t split lines, and don’t respect chart settings. Legends use excessive real estate, have limited positioning with no intelligence and don’t split text. Expression cycles are confusing for end-users. Fonts and colors are buried 3 levels deep. “Themes” exclude certain chart elements, requiring the developer to dive deep into menus to make targeted changes. Scatter plots quickly become a messy jumble of points and labels. Removing scatter plot data point labels makes identifying a data point a painful task of color matching.

There doesn’t seem to be any point in discussing geospatial data, for which QlikView has no native abilities. QlikTech has been frustratingly quiet on this. Want to include Google Maps? You’re welcome to search for code in the community, or pay more for third-party tools. Meanwhile, the competitors’ native support is easy and attractive.

QlikView is still the best tool out there for “getting things done”. Graphical display is one of a few areas where QlikView is lagging. But QlikView is too far behind at this point. Charts have not been overhauled since at most version 7. It’s time for a major leap forward.

Trigger QlikView Publisher EDX Task From Windows Powershell

Here’s a script to trigger an EDX task from Powershell. As it is, you will need to change QVSERVER to match your server name. The script can then be run from the command line by passing the task name and EDX password as parameters.

Download the QlikView EDX Trigger in Powershell. Do not copy and paste the code below.

param($taskName,$taskEDXPassword)

function QVPOST([string]$updateurl, [string]$text)
{
     $result = $null
     [System.Net.HttpWebRequest] $request = [System.Net.HttpWebRequest] [System.Net.WebRequest]::Create($updateurl)
     $request.UseDefaultCredentials = $true
     $request.Method = "POST"
     $request.ContentType = "application/x-www-form-urlencoded"
     $request.ContentLength = $text.Length

     [System.IO.StreamWriter] $stOut = new-object System.IO.StreamWriter($request.GetRequestStream(), [System.Text.Encoding]::ASCII)
     $stOut.Write($text)
     $stOut.Close()

     [System.Net.HttpWebResponse] $response = [System.Net.HttpWebResponse] $request.GetResponse()
     if ($response.StatusCode -ne 200)
     {
           $result = "Error : " + $response.StatusCode + " : " + $response.StatusDescription
     }
     else
     {
           $sr = New-Object System.IO.StreamReader($response.GetResponseStream())
           $result = $sr.ReadToEnd()
     }

     return $result
}

1$response = 1(QVPOST "http://QVSERVER:4720/qtxs.asmx" "<Global method=`"GetTimeLimitedRequestKey`" />")
$requestKey = $response.GetTimeLimitedRequestKey.GetTimeLimitedRequestKeyResult

$taskEDXRequest=@"
<Global method="RequestEDX" key="$($requestKey)">
<i_TaskIDOrTaskName>$($taskName)</i_TaskIDOrTaskName>
<i_Password>$($taskEDXPassword)</i_Password>
<i_VariableName />
<i_VariableValueList />
</Global>
"@

$response = QVPOST "http://QVSERVER:4720/qtxs.asmx" $taskEDXRequest
echo $response.RequestEDX.RequestEDXResult

A Brief Look at QlikView Storage

You can learn a lot about how QlikView stores data in memory by looking at how it stores data on disk in QVDs.

QVDs are QlikView’s proprietary data storage format. QVDs are in a format optimized for reading into memory. A QVD stores one QlikView table. Each column is stored separately. Only the unique values are stored for each column; this mimics QlikView’s storage of unique values in memory. With all the column values stored the only thing missing is a data structure–one record for each row of the table–that stores a series of indexes to the unique value stored in each field of the row. In fact these indexes are stored in a highly compressed format that mimics the storage of indexes in memory.

Note: By using QVDs, the storage of the overall table is much smaller, but the unique column values are not compressed using something like ZIP or Gzip compression. Why? Because that would make the loading of QVD files slower due to the overhead of decompression! Try compressing a QVD sometime. They get a lot smaller!

Getting back to the point of this post… In the header of a QVD file is some valuable information that you can use to measure the size of data in memory. Open up your QVD using a text editor. If your QVD is very large, you will need an editor that can handle large files. Look at the <FieldNames> tags that are in your table. The structure is similar to the following:

<Fields>
<QvdFieldHeader>
<FieldName>Date</FieldName>

Within each FieldName tag, you will see <Length> and <NoOfSymbols>. Length is how much memory is needed to store the columns’ unique values in memory. Length / NoOfSymbols = bytes per symbol, which you can use to estimate your in-memory storage needs as data volumes grow.

In the real world, your in-memory storage may be better than a QVD. QlikView has optimizations that can dramatically improve storage. They are not used in QVDs because they either slow down the loading of data from QVD or because it is not possible to perform the optimization until the QlikView Script has finished execution. For example, QlikView will store a column of entirely consecutive values (11, 12, 13…100) as offsets from the base value (11) rather than using 8 bytes to store each unique value. This optimization can’t be done until the script is finished and QlikView can evaluate the column in its final form.

I hope this helps you get more information on how QlikView is handling your data. Some of the topics related to this post would be: (a) the storage of numbers as both text and numeric representations, how to identify this and how to avoid it; and (b) more definitive calculations for estimating your storage needs in-memory with QlikView Server.

UPDATE: QlikView 10 treats numbers differently in memory versus in QVD. QV 10 does not store numbers in memory with fewer than 9 bytes (8 bytes + 1 byte of overhead). However, QV 10 does store numbers in QVDs using 5 bytes (4 bytes + 1 byte of overhead) when the values meet certain criteria.

QlikView Scalability

QlikTech recently announced their Scalability Center for customers to evaluate their QlikView apps under load and using the best servers available. At the time of the announcement, just one day after Intel’s own, they say they will have systems based on the Xeon 7500 series processors. The performance improvements over the 7400 series alone are impressive. Rather than look at what the processor and chipset can do in the Intel literature, let’s look at what they are doing in offerings from Dell and IBM.

The PC3-10600 memory is twice as fast as the 5300 memory being offered with the Xeon 7400 series processors. This is due to the introduction of the QuickPath Interconnect (QPI). The memory sticks peak at 10.6 GB/s of transfer and the QPI can support more than 25GB/s.

Also included in the Nehalem cores (i5, i7 and Xeon 7500) is a feature called Turbo Boost. Even though it reminds me of the “Turbo” button on the front of my desktop PC from the 1980′s, it is actually a feature that automatically overclocks the processor if the chip’s power and temperature are within limits. It sounds perfect for QlikView, which leaves processors sitting idle and cool, then suddenly requires a burst of the highest possible performance.

Some other features:

  • 64 DIMM slots for up to 1 terabyte of memory.
  • 4, 6 or 8 cores… 8 cores x 4 sockets = 32 cores
  • 12, 18 and 24MB caches
  • 1.86 GHz – 2.26GHz (The 2.66 GHz is being produced but not included in systems as of May 5, 2010.)
  • 10 GB network connections

These are some amazing machines for running QlikView. And this ties in perfectly with recent announcements about support for large data volumes in QlikView 10, both during the load and in the server.

Below are some samples of current hardware pricing (no software, no networking, basic Dell build) as of May 5, 2010:

Systems (without memory):

  • 4 processor, 8 core Xeon X7560 @ 2.26 GHz: $33,000
  • 4 processor, 6 core Xeon E7530 @ 1.86 GHz: $19,500
  • 2 processor, 4 core Xeon E7520 @ 1.86 GHz: $17,000

Memory:

  • 128GB: $8,000
  • 256GB: $14,000
  • 512GB: $34,000
  • 1TB: $93,000

A final note: Once again, Intel is including hyperthreading in the Nehalem cores. Hyperthreading is designed to help deal with unoptimized applications and the limitations of operating system schedulers. My understanding from the last time that hyperthreading was actively marketed is that QlikView does not benefit and can actually suffer when hyperthreading is enabled. QlikView has highly optimized code and uses it’s own threading algorithms to maintain peak performance. Hopefully someone from QlikTech can confirm in the comments that hyperthreading is not advisable.

Happy QlikViewing!

The Year in QlikView

A subtle and powerful shift happened last year.

I was building a QlikView application for financials. My client and I had discussed the idea of this application a year earlier but it was impossible then. Version 8.5 had not been released. The ability to look at dozens of simultaneous selection sets is the key to making this great idea work.

Zoom forward to 2009. Versions 8.5 and then 9.0 are released with features including Set Analysis, unlimited rows, chaining of document selections, data export from the script, and Dynamic Tables. These innovations remove the architectural limitations of QlikView that had tied my hands. A year ago I could not deliver the solution that was in my head and that my client needed. Now these limits are gone and I can build exactly what my client needs.

Build exactly what my client needs? This is the first time that this thought crossed my mind. It’s true! With the release of version 9, QlikView has entered a new phase. One that no other product can match.

QlikView is the first and ONLY tool on the market in which every business analysis that I have been asked to build can be built with confidence and an expectation of success. Dream big!

QlikView is not SPSS or JMP, and it never will be, but since I have never been asked to do anything more complex than a regression, QlikView works perfectly.

QlikView is the tool to turn to. It delivers results. Real value, right now. And you can be confident that it can achieve any business analysis you can think of. To get an idea of what QlikView can do, follow this thread on LinkedIn with over 100 unique uses for QlikView.

What’s “possible” in QlikView 9…

QlikView 9 adds a powerful new feature to Set Analysis. In 8.5, you must specify the values to be selected in a field. But how can we dynamically make selections based on another field? QlikView 9 introduces “Set Modifiers with Implicit Field Value Definitions” in section 19.4 of the reference manual. Here’s an example:

sum( {$ Customer = P({1<Product={‘Shoe’}>} Customer)>} Sales )

The expression inside P(..) is saying “Start with the complete data set and select the value “Shoe” in the Product field. The list of customers after making that selection is what I want to use to calculate sum(Sales).” Ok, that’s great. It’s dynamically generated based on selections in another field. Now we don’t need a crazy concat() expression to acheive this like we did in 8.5. And by using E(..) instead of P(..), it’s now possible for the first time to select the excluded values! That wasn’t even possible in 8.5!

But the real power is the second Customer. You do not need to return a list of values from Customer in order to make a selection on Customer.

So what? My favorite application of this new ability is in using multiple calendar tables. Let’s say you want to look at sales between two different calendar periods…

  • In the pre-8.5 method, without set analysis, you were limited to sum-if logic. In order to pick two dates and calculate over two periods, both calendars needed to be unconnected from the data model. Because these tables are unconnected, other list boxes would not update as the user made date selections.
  • The 8.5 approach to multi-time-period analysis has been to use bookmarks and macros. You make one period selection and bookmark it, then make the next selection. One calculation applies the bookmark, the other does not. List boxes for other dimensions continue to update as new selections are made. The user cannot see the bookmarked date selections again without applying the bookmark. The bookmark can also contain additional, unintended selections. The way to avoid this is with macro code that makes selections, which may or may not kill the server selections cache, depending on who you ask.
  • THE NEW  QlikView 9 way is to have two calendar tables in the data model, one connected and one unconnected. Using the new implicit selections feature we select the Possible dates in the unconnected table like so: {$Date=P({$} UnconnectedDate)>}. On the unconnected table, the user may have made selections on Year or Month or Day of Week, but we don’t care. We will select whatever UnconnectedDate values remain. By displaying both date tables we maintain visibility for both sets of date selections. Because one date table is connected, all the other fields continue to update. It is best therefore to use the connected date table as the current period and the unconnected table as the past period.

Hope you got something out of this. Cheers!


QlikView 9: Export Document Layout to XML

One of the new features in QlikView 9 is “Export Document Layout”. Despite the name, the entire application without data is exported to XML: script, layout, embedded images, everything!

I exported the blank “New” document to XML. Click here to see what it looks like in XML.

Why add XML export? One of the big-picture features that was missing from QlikView is support for version control. By using XML as the storage format you can use standard version control products and produce accurate differentials.

But now that QlikView has embraced an open file format, there are so many more things we can do…

Track Metadata in Large Deployments

Some QlikView deployments have many HUNDREDS of QVW files. Someone is probably over 1000 by now. These are unique apps; not the result of Publisher distribution loops. It’s a tedious task to manage metadata when it’s locked inside a proprietary file format.

Which apps use the [Salesman] field? Which apps have expressions that fail to multiply [Revenue] by the [Exchange Rate]? Use a tool like Rob Wunderlich’s on the XML versions of all these QVW files and we can track, compare and search hundreds of thousands of expressions, tens of thousands of layout objects, and thousands of scripts as quickly as we would explore any other data in QlikView.

Generate New Applications

Kalido is exporting basic QVWs and QVDs, That feature required custom code integration. Soon any one of us can achieve the same end result: export a complete XML specification, import the XML into QlikView, and hit reload to execute the script. If QlikTech can make this happen from the command-line then the entire process can be hidden and the end user receives a freshly baked QVW (or QVD!).

Dynamically Update Applications

You could generate a full application, but it’s more likely that you will want to make small changes to one that’s already developed.

What do you do when you cannot predict the format of a data update. What if there are new columns added each day? Or what if the headers change? One of the simple things that QlikView can’t handle on its own is the addition or removal of fields from Excel sheets. I don’t know any business intelligence tool that gracefully adapts when a data field is missing.

What if you would like to change the layout every day. Maybe a new field in the Excel sheet needs to be added to a table, or to an expression, or to a drill-down. Maybe you would like to add a chart. Every part of the application is exposed in XML, so you can do anything you want.

Go With The Flow

Speaking more generally about adapting to data… One of the mantras of business intelligence and the database world is “Will the data have this format every time?” If it won’t, no deal. Well, now we can adapt to changing data. We can also deliver dynamic applications that are driven by that data.

And more…

It’s hard to say where this goes but I think it has big potential. Are there any other BI tools that we can look to for inspiration? Do you see yourself using this feature? I’d love to read your ideas!

QlikView Server 9 New Look!

I’m kicking the tires on this new version and so far it makes a really good impression. I did the default install on a vanilla Windows 2003 Server x64 SP2 installation. I’ll post notes on installation sometime soon. This is the default view of the server documents at http://localhost/qlikview/

Click to see the full-size image.

Click for a larger image.

Click for a larger image.