A comment by Thomas Dinsmore at DBMS2.com
Re “beer and diapers” — I first heard this at a 1996 Data Mining conference in San Francisco, where a Teradata presenter used the story to tout the capabilities of Teradata.
A year or so later, Forbes ran a piece quoting the head of merchandising for Wal-Mart saying that even if the finding were true he wouldn’t know what to do with it.
That’s an important point. Suppose that it’s true that shoppers tend to purchase beer and diapers on Friday nights. Does this mean retailers should:
(1) Place beer and diapers next to one another in the store, for shopper convenience;
(2) Place beer and diapers far apart in the store, to maximize time in store;
(3) Issue a coupon for beer when purchased with diapers;
(4) Don’t issue a coupon, since shoppers buy beer and diapers anyway.
There is no way to answer the question without a follow-up test and learn. In the absence of an experimental design, observed associations have little or no value for decision-making.
From a pricing presentation at Qonnections 2012:
• As of QV11 SR1 (March 2012) Real Time Server will no longer be sold as a stand-alone product
• The Dynamic Update feature will be included in all QlikView Servers
– This feature would be turned on or off by a setting in the Server Management Console
Here’s another angle on big data applied to the world surrounding the common business: graph databases. This is a more natural way to model the complete context of a transaction. It’s a big data problem and every business can benefit from analyzing it.
For example, each business transaction is a node. And each person that touches it, or is responsible for it. Each customer and vendor and the communication with them. And each component of the transaction, such as the product, or serial number. And the billions of relationships between all these nodes.
Validation from IBM. Combine BPM, social communication and mobile interfaces. Ok, now just add 2-way business intelligence and some BigMetaData concepts for a revolutionary product.
Great company, great examples.
eBay makes a quick decision with bottom line results using simple surveys.
“This used to be a job for outside specialists. Now basically anyone inside Thomson Reuters can do this,” says Nicole Gagnon, a senior director of market research.
The scalability material for QlikView states it will scale linearly and take advantage of all cores. I think this is misleading because as I tell my customers, “QlikView scales linearly but hardware does not.” At the most recent Qonnections there was a realistic discussion of this problem in two cases.
The first is Non-Uniform Memory Access, or NUMA. This technology divides the memory among the processors, creating “local” and “remote” banks. QlikView would rather treat memory as a uniform segment. So the advice from QlikTech is to disable NUMA. It’s not always possible so there is also a QlikView configuration option to better spread the data throughout memory. What these options do is enforce average-case memory access times and avoid worst-case situations for one object versus another. If you had an app that could fit entirely into a single memory area, you’d do better to leave NUMA enabled.
I expect that future releases of QlikView will take advantage of NUMA. One of the advantages of QlikView’s batch processing is that memory storage requirements are known ahead of time and therefore optimizable. QlikView’s current data structures are compact, but still tabular. I expect to see QlikView intelligently divide its workload across processors using a hybrid of row-based and column-based storage.
The second issue is inter-processor communication. At Qonnections there was a specific concern about using an 8-way (8 processor socket) design instead of 4-way because the communication starts to take a significant percentage of peak performance. Although the cores are calculating on a subset of rows, the row details are stored across all memory areas, which means constant communciation among processors. QlikView’s current bit-stuffed indexes could be split into row or column chunks to push aggregation to be local to the processors.