Open-Source QlikView Engine?

Druid is an in-memory high-speed analytic database that is distributed in a cloud platform. MetaMarkets says they plan to release the code in the coming year. Data is fed in a semi-aggregated form from a Hadoop backend that stores the raw data. Their cluster of 40 modest machines churns through 1 billion rows of arbitrary dimensionality in 1 second. In another post they mention that they can use 6TB of memory spread among multiple machines before they incur degradation in speed due to cloud communication overhead.

Posted in Uncategorized | Leave a comment

Great Features In QlikView 11

QlikView 11 looks great. Three new features in particular are going to make great impressions: the improved Web interface (AJAX), Session Sharing, and Notes.

Posted in Business Intelligence, QlikView, Visualization | 3 Comments

The Myth and Mystery of Big Data

“With enough data, you can discover patterns and facts using simple counting that you can’t discover in small data using sophisticated statistical and machine learning approaches.” Link

I used to assume that big data and data mining and statistics were inseparable. But the reality–companies making a killing transforming data into value–is far from complex.

Big data is not hard. Statistics are not required. Neither are complex algorithms. Google’s Marissa Mayer attributed the company’s intelligence to the volume of data available for cross-referencing and not to clever algorithms. Google translate leveraged massive volumes of cross-referenced text in multiple languages rather than a finely tuned understanding of grammar. Voice translation uses much the same technique based on huge volumes of recorded, transcribed text.

Right now our two best tools are visualization and data exploration (business discovery). Both are simple, easy to demonstrate and easy to grasp. The big data revolution’s message to the masses is that simple correlation will outstrip them both as long as enough data can be crunched. And much of this can be automated, pre-calculated, and even anticipated. Imagine the analysis system analyzing itself: these people tend to ask these questions at these times!

Data can be correlated post-hoc. Correlation does not equal causation, but simple correlation is ample evidence on which to take action. Correlation is immediately perceived visually. Correlation is relative and easy to compare. Correlation can look at 2, 3, 4 or more factors at once. Correlation is business friendly. It is easily understood. Correlation is gut-instinct compatible. Kids understand it: mom gets upset when I put peanut butter on the cat. If I do it right now, she’ll probably be mad.

The business opportunity is really that so much big data is simply thrown away. The opportunity to store all this data didn’t exist, so we have an old habit of simply letting it vaporize. Every server message, every website click, every customer contact and interaction, every manufacturing activity, temperature, timeclock action, phone call received, phone call placed, security video, email sent. Every bit of data can be analyzed, and from multiple perspectives: employee, employer, customer, vendor, shipper, receiver, and on and on.

We don’t know what we’ll find. As more and more stories of big data at little(er) companies emerge, the snowball will become an avalanche.

Posted in Business Intelligence, Interactive Analysis, Uncategorized, Visualization | 1 Comment

QlikView Local File Security Easily Bypassed

QlikView’s “section access” security can be easily ignored with a hacked copy of QV.EXE. Hacking the file is a simple process that takes only a few minutes. Do not rely on “section access” to protect your data in a local QVW file.

The effectiveness of “section access” on server-hosted files is not a part of this warning.

Posted in Uncategorized | Leave a comment

Writeback in QlikView

The ability to push data from QlikView back out to a database is beneficial for what-if analysis, financial reporting, CRM dashboards and more. Unfortunately our ability to achieve this, even using custom code, is limited. On the frontend, there are only a few objects that can support user input and very little control over how these objects display. On the backend, connecting QlikView back to a database is very difficult to do well.

Part of the problem is that QlikView server is not always in control of this communication. If you’re using Plugin, it’s done by the client. If you’re using AJAX or Mobile, it’s done by the server. If, however, you code your solution as an Extension, things shift back to the client side again.

The common way to implement writeback is through VBScript macros. Examples of code to achieve this are plentiful on QlikCommunity. Although a simple version can be mocked up in a few lines of code, deploying this solution to many users in a modestly secure environment has serious disadvantages.

  • Each client machine needs to communicate to the database. Therefore, database drivers need to be installed on each client. Credentials need to be included in the code of each QlikView document, leaving them exposed to users. Special ports need to be open in firewalls for driver communication. These are poor security practices and should be reason enough for any enterprise to abandon this approach.
  • Client-side code is difficult to monitor. Error handling is poor. A separate system would need to be in place to capture errors for analysis and resolution.
  • Managing conflicts in a distributed environment requires careful design and development.
  • Communication between the VBS execution environment and the database can be slowed for any number of reasons. This leaves the client in an uncertain state, without feedback on progress or problems. Meanwhile, the application state in QlikView can continue to change. This easily can cause inconsistencies.
  • Database driver communication uses proprietary protocols that are difficult to monitor for debugging and by network security software.

But there is a much better way to implement writeback to a database from QlikView: build a lightweight web service. What this means is to have QlikView send a structured request to a web server that can interpret the request, make the appropriate database changes, and send a useful response back to QlikView. Overall, this approach is far more flexible, reliable, compatible, configurable and maintainable.

  • The response to a web service command (HTTP POST) can itself be an extensive report on the success or failure of any updates. This data can be made visible to the user as a clear confirmation that changes were successful.
  • Server-side code is more reliable. It’s far easier to manage many users updating data at the same time. It’s easier to record and react to errors. Implementing your web service in PHP gives you a community with examples of good design.
  • Server-side code can handle any level of complexity such as triggering other systems. Client crashes need not leave complex processes in unresolved states.
  • This approach only requires a web port to be open in the firewall and therefore is more likely to work regardless of where the user is located.
  • With this approach, it is easier to handle database rollbacks, atomic transactions and other features that support the completion of a transaction.
  • Changes do not need to be sent to a database one value at a time. Instead, changes can be aggregated into a single update on the QlikView side. Aggregating changes is done faster than database communication. There is less chance of stalling the user session or allowing QlikView’s state to change in the middle of an update process.
  • Multiple tables can be updated. For example, not only can a value be updated, but a separate audit log can be updated with who made the change, when, and to what value.
  • Communication using XML over HTTP is readily captured by network security software.
  • Web services can leverage existing network infrastructure. For example, IIS & Active Directory will authenticate the user making the web request. The web service code can be passed this information reliably.
  • The database is read and written by a single set of credentials, written once in the web services code, and running on a secured server, without any access from other machines on the network. This is far more secure and a much easier sell to the IT department.

I’ve had plenty of success with this approach, combining IIS, ActiveDirectory, PHP and QlikView VBScript macro code. I don’t think we need writeback as a QlikView feature. I would, however, like to see a few changes to QlikView to better support features like this.

  • Support the editing of Input Fields in more objects, such as when used as dimensions in a Pivot Table, or in a bar chart.
  • Support multi-line text in Input Fields.
  • Add functions to VBScript and Extensions to identify the rows of a table with Input Fields that have been changed since reload. For Extensions, something like a “next” iterator that moves to the next changed value.
  • Make it possible to share Input Fields across users–without using collaboration objects as a kludge.
  • Add Extension/AJAX functions for managing the data behind Extension objects with millions of rows.
  • Support the updating of Input Fields from AJAX.

Happy Qliking!

Posted in QlikView | Tagged , , , , | 1 Comment

QlikView Google Maps Javascript v3 Integration

Here is a sample of integrating a Google Maps JavaScript v3 API map into QlikView as an extension object.

QlikView Integrate Google Maps JavaScript v3 API

Posted in QlikView | Tagged , , , | Leave a comment

Copying and Pasting Colors In QlikView

It is possible to copy and paste color settings among all layout objects! Even two-color gradients can be easily copied and pasted. If the destination object doesn’t support gradients, the first color is kept and the second color is ignored.

Simply right-click on the color box to copy or paste.

My thanks to Joe Feyas for discovering this!

Posted in QlikView | Tagged , , , | 1 Comment

Can a humble Chart object get some love?

If there was just one question I could ask at this year’s Qonnections 2011, it would be this…

When are we going to see improvements to the most basic QlikView task: displaying data?

Look at the following examples from competitors…

Above is a Spotfire chart that cleanly displays a 2-level hierarchy of dimension values on the x-axis. Increase to 3 levels and the labels stay organized and readable.

Below is a chart from Tableau.

The axis labels are only shown at the left and the bottom of the entire trellis. QlikView shows axis labels on each square, adding unnecessary clutter that is not easy to remove. Two dimension values are coded in the size of the dots and their color. Tableau also uses color gradients easily and effectively.

Tableau and Spotfire put a lot of energy into making displays clean and readable. Tableau makes excellent guesses at how to display your data.

QlikView’s charts have felt clunky for years. The Chart building dialog is huge, confusing and too often doesn’t work as expected. Charts don’t adapt well to being small. Axis labels cram into each other, don’t split lines, and don’t respect chart settings. Legends use excessive real estate, have limited positioning with no intelligence and don’t split text. Expression cycles are confusing for end-users. Fonts and colors are buried 3 levels deep. “Themes” exclude certain chart elements, requiring the developer to dive deep into menus to make targeted changes. Scatter plots quickly become a messy jumble of points and labels. Removing scatter plot data point labels makes identifying a data point a painful task of color matching.

There doesn’t seem to be any point in discussing geospatial data, for which QlikView has no native abilities. QlikTech has been frustratingly quiet on this. Want to include Google Maps? You’re welcome to search for code in the community, or pay more for third-party tools. Meanwhile, the competitors’ native support is easy and attractive.

QlikView is still the best tool out there for “getting things done”. Graphical display is one of a few areas where QlikView is lagging. But QlikView is too far behind at this point. Charts have not been overhauled since at most version 7. It’s time for a major leap forward.

Posted in QlikView, Visualization | Tagged , , , , | 5 Comments

Trigger QlikView Publisher EDX Task From Windows Powershell

Here’s a script to trigger an EDX task from Powershell. As it is, you will need to change QVSERVER to match your server name. The script can then be run from the command line by passing the task name and EDX password as parameters.

Download the QlikView EDX Trigger in Powershell. Do not copy and paste the code below.

param($taskName,$taskEDXPassword)

function QVPOST([string]$updateurl, [string]$text)
{
     $result = $null
     [System.Net.HttpWebRequest] $request = [System.Net.HttpWebRequest] [System.Net.WebRequest]::Create($updateurl)
     $request.UseDefaultCredentials = $true
     $request.Method = "POST"
     $request.ContentType = "application/x-www-form-urlencoded"
     $request.ContentLength = $text.Length

     [System.IO.StreamWriter] $stOut = new-object System.IO.StreamWriter($request.GetRequestStream(), [System.Text.Encoding]::ASCII)
     $stOut.Write($text)
     $stOut.Close()

     [System.Net.HttpWebResponse] $response = [System.Net.HttpWebResponse] $request.GetResponse()
     if ($response.StatusCode -ne 200)
     {
           $result = "Error : " + $response.StatusCode + " : " + $response.StatusDescription
     }
     else
     {
           $sr = New-Object System.IO.StreamReader($response.GetResponseStream())
           $result = $sr.ReadToEnd()
     }

     return $result
}

1$response = 1(QVPOST "http://QVSERVER:4720/qtxs.asmx" "<Global method=`"GetTimeLimitedRequestKey`" />")
$requestKey = $response.GetTimeLimitedRequestKey.GetTimeLimitedRequestKeyResult

$taskEDXRequest=@"
<Global method="RequestEDX" key="$($requestKey)">
<i_TaskIDOrTaskName>$($taskName)</i_TaskIDOrTaskName>
<i_Password>$($taskEDXPassword)</i_Password>
<i_VariableName />
<i_VariableValueList />
</Global>
"@

$response = QVPOST "http://QVSERVER:4720/qtxs.asmx" $taskEDXRequest
echo $response.RequestEDX.RequestEDXResult
Posted in QlikView | Tagged , , , | 4 Comments

A Brief Look at QlikView Storage

You can learn a lot about how QlikView stores data in memory by looking at how it stores data on disk in QVDs.

QVDs are QlikView’s proprietary data storage format. QVDs are in a format optimized for reading into memory. A QVD stores one QlikView table. Each column is stored separately. Only the unique values are stored for each column; this mimics QlikView’s storage of unique values in memory. With all the column values stored the only thing missing is a data structure–one record for each row of the table–that stores a series of indexes to the unique value stored in each field of the row. In fact these indexes are stored in a highly compressed format that mimics the storage of indexes in memory.

Note: By using QVDs, the storage of the overall table is much smaller, but the unique column values are not compressed using something like ZIP or Gzip compression. Why? Because that would make the loading of QVD files slower due to the overhead of decompression! Try compressing a QVD sometime. They get a lot smaller!

Getting back to the point of this post… In the header of a QVD file is some valuable information that you can use to measure the size of data in memory. Open up your QVD using a text editor. If your QVD is very large, you will need an editor that can handle large files. Look at the <FieldNames> tags that are in your table. The structure is similar to the following:

<Fields>
<QvdFieldHeader>
<FieldName>Date</FieldName>

Within each FieldName tag, you will see <Length> and <NoOfSymbols>. Length is how much memory is needed to store the columns’ unique values in memory. Length / NoOfSymbols = bytes per symbol, which you can use to estimate your in-memory storage needs as data volumes grow.

In the real world, your in-memory storage may be better than a QVD. QlikView has optimizations that can dramatically improve storage. They are not used in QVDs because they either slow down the loading of data from QVD or because it is not possible to perform the optimization until the QlikView Script has finished execution. For example, QlikView will store a column of entirely consecutive values (11, 12, 13…100) as offsets from the base value (11) rather than using 8 bytes to store each unique value. This optimization can’t be done until the script is finished and QlikView can evaluate the column in its final form.

I hope this helps you get more information on how QlikView is handling your data. Some of the topics related to this post would be: (a) the storage of numbers as both text and numeric representations, how to identify this and how to avoid it; and (b) more definitive calculations for estimating your storage needs in-memory with QlikView Server.

UPDATE: QlikView 10 treats numbers differently in memory versus in QVD. QV 10 does not store numbers in memory with fewer than 9 bytes (8 bytes + 1 byte of overhead). However, QV 10 does store numbers in QVDs using 5 bytes (4 bytes + 1 byte of overhead) when the values meet certain criteria.

Posted in QlikView | Tagged , , , , , | Leave a comment