The scalability material for QlikView states it will scale linearly and take advantage of all cores. I think this is misleading because as I tell my customers, “QlikView scales linearly but hardware does not.” At the most recent Qonnections there was a realistic discussion of this problem in two cases.
The first is Non-Uniform Memory Access, or NUMA. This technology divides the memory among the processors, creating “local” and “remote” banks. QlikView would rather treat memory as a uniform segment. So the advice from QlikTech is to disable NUMA. It’s not always possible so there is also a QlikView configuration option to better spread the data throughout memory. What these options do is enforce average-case memory access times and avoid worst-case situations for one object versus another. If you had an app that could fit entirely into a single memory area, you’d do better to leave NUMA enabled.
I expect that future releases of QlikView will take advantage of NUMA. One of the advantages of QlikView’s batch processing is that memory storage requirements are known ahead of time and therefore optimizable. QlikView’s current data structures are compact, but still tabular. I expect to see QlikView intelligently divide its workload across processors using a hybrid of row-based and column-based storage.
The second issue is inter-processor communication. At Qonnections there was a specific concern about using an 8-way (8 processor socket) design instead of 4-way because the communication starts to take a significant percentage of peak performance. Although the cores are calculating on a subset of rows, the row details are stored across all memory areas, which means constant communciation among processors. QlikView’s current bit-stuffed indexes could be split into row or column chunks to push aggregation to be local to the processors.