You can learn a lot about how QlikView stores data in memory by looking at how it stores data on disk in QVDs.
QVDs are QlikView’s proprietary data storage format. QVDs are in a format optimized for reading into memory. A QVD stores one QlikView table. Each column is stored separately. Only the unique values are stored for each column; this mimics QlikView’s storage of unique values in memory. With all the column values stored the only thing missing is a data structure–one record for each row of the table–that stores a series of indexes to the unique value stored in each field of the row. In fact these indexes are stored in a highly compressed format that mimics the storage of indexes in memory.
Note: By using QVDs, the storage of the overall table is much smaller, but the unique column values are not compressed using something like ZIP or Gzip compression. Why? Because that would make the loading of QVD files slower due to the overhead of decompression! Try compressing a QVD sometime. They get a lot smaller!
Getting back to the point of this post… In the header of a QVD file is some valuable information that you can use to measure the size of data in memory. Open up your QVD using a text editor. If your QVD is very large, you will need an editor that can handle large files. Look at the <FieldNames> tags that are in your table. The structure is similar to the following:
Within each FieldName tag, you will see <Length> and <NoOfSymbols>. Length is how much memory is needed to store the columns’ unique values in memory. Length / NoOfSymbols = bytes per symbol, which you can use to estimate your in-memory storage needs as data volumes grow.
In the real world, your in-memory storage may be better than a QVD. QlikView has optimizations that can dramatically improve storage. They are not used in QVDs because they either slow down the loading of data from QVD or because it is not possible to perform the optimization until the QlikView Script has finished execution. For example, QlikView will store a column of entirely consecutive values (11, 12, 13…100) as offsets from the base value (11) rather than using 8 bytes to store each unique value. This optimization can’t be done until the script is finished and QlikView can evaluate the column in its final form.
I hope this helps you get more information on how QlikView is handling your data. Some of the topics related to this post would be: (a) the storage of numbers as both text and numeric representations, how to identify this and how to avoid it; and (b) more definitive calculations for estimating your storage needs in-memory with QlikView Server.
UPDATE: QlikView 10 treats numbers differently in memory versus in QVD. QV 10 does not store numbers in memory with fewer than 9 bytes (8 bytes + 1 byte of overhead). However, QV 10 does store numbers in QVDs using 5 bytes (4 bytes + 1 byte of overhead) when the values meet certain criteria.