<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>... and points beyond</title>
	
	<link>http://andpointsbeyond.com</link>
	<description>mostly about data</description>
	<pubDate>Mon, 15 Dec 2008 17:41:10 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/AndPointsBeyond" type="application/rss+xml" /><item>
		<title>QlikView’s Rapid Time-to-Implementation Improves BI Value: A TDWI Interview</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/485755239/</link>
		<comments>http://andpointsbeyond.com/2008/12/15/qlikviews-rapid-time-to-implementation-improves-bi-value-a-tdwi-interview/#comments</comments>
		<pubDate>Mon, 15 Dec 2008 17:41:10 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[QlikView]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/2008/12/15/qlikviews-rapid-time-to-implementation-improves-bi-value-a-tdwi-interview/</guid>
		<description><![CDATA[&#8220;We work the way your mind works. It doesn&#8217;t matter if you get the thing perfect the first time. Let your mind go the way it wants and ask the questions that you want to ask. Your can customize [QlikView] based on the kinds of questions, the kinds of analysis, that [your users] want to [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;We work the way your mind works. It doesn&#8217;t matter if you get the thing perfect the first time. Let your mind go the way it wants and ask the questions that you want to ask. Your can customize [QlikView] based on the kinds of questions, the kinds of analysis, that [your users] want to do,&#8221; Deighton says.</p>
<p><a href="http://www.tdwi.org/News/display.aspx?ID=9238" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://www.tdwi.org/News/display.aspx?ID=9238');">QlikView&#8217;s Rapid Time-to-Implementation Improves BI Value </a></p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/485755239" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/12/15/qlikviews-rapid-time-to-implementation-improves-bi-value-a-tdwi-interview/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/12/15/qlikviews-rapid-time-to-implementation-improves-bi-value-a-tdwi-interview/</feedburner:origLink></item>
		<item>
		<title>Vertica for the Cloud</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/482406172/</link>
		<comments>http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/#comments</comments>
		<pubDate>Fri, 12 Dec 2008 06:36:01 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[Vertica]]></category>

		<category><![CDATA[cloud]]></category>

		<category><![CDATA[data warehouse]]></category>

		<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/</guid>
		<description><![CDATA[While I have my head in the clouds, I should mention that Vertica has a cloud solution that they manage for you. Not new, but gives some perspective.
With competitive offerings in the $10-20k per terabyte, this is an attractive offer and a great way to try before you invest when you have that much data.
I [...]]]></description>
			<content:encoded><![CDATA[<p>While I have my head in the clouds, I should mention that Vertica has a cloud solution that they manage for you. Not new, but gives some perspective.</p>
<p>With competitive offerings in the $10-20k per terabyte, this is an attractive offer and a great way to try before you invest when you have that much data.</p>
<p>I hear Vertica is a screamer, but I can&#8217;t imagine getting sub-second results for 3 TB of data on 3 virtualized servers, for the same reasons I gave in my previous post.</p>
<p><a href="http://www.vertica.com/_pdf/verticacloudpricing" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://www.vertica.com/_pdf/verticacloudpricing');">Vertica for the Cloud Pricing </a></p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/482406172" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/</feedburner:origLink></item>
		<item>
		<title>QlikView in the Cloud</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/481299098/</link>
		<comments>http://andpointsbeyond.com/2008/12/10/qlikview-in-the-cloud/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 05:01:02 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[QV Server]]></category>

		<category><![CDATA[QlikView]]></category>

		<category><![CDATA[cloud]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=279</guid>
		<description><![CDATA[QlikView depends entirely on processor speed, processor cache performance, memory latency and memory throughput. This makes QlikView an ideal reference for Intel, who uses QlikView to show off the latest product improvements. It also adds to the challenge of adapting QlikView to cloud platforms such as Amazon Web Services, Mosso, Joyent, etc.
The problem is virtualization. [...]]]></description>
			<content:encoded><![CDATA[<p>QlikView depends entirely on processor speed, processor cache performance, memory latency and memory throughput. This makes QlikView an ideal reference for Intel, who uses QlikView to show off the latest product improvements. It also adds to the challenge of adapting QlikView to cloud platforms such as Amazon Web Services, Mosso, Joyent, etc.</p>
<p>The problem is virtualization. Virtualization is valuable to customers and service providers, but it&#8217;s also a thief! It adds overhead for the processor, cache and memory&#8211;everything that impacts QlikView performance!</p>
<p>The cloud, as in real life, is ever changing. You have no idea how many people are sharing your hardware and what their load will be from second to second. I would bet that nearly all deployed QlikView servers spend most of their time idle and the rest of their time at peak processing power. In the cloud, the goal is to spend as little time as possible idle for which we sacrifice peak processing power. QlikView depends on peak processing power and that type of application will suffer the most in the cloud.</p>
<p>But exactly how much will it suffer? Success in the cloud will need to be measured by the end-user experience. The cost of being in the cloud is vigilant monitoring and smart responses. What&#8217;s the right way to monitor the end-user experience in a company that uses OCX vs. AJAX, or is spread out geographically? Will bringing up more servers in the cloud improve response time? Should every QlikView server deliver the same set of apps, or should each app be served by a dynamic set of servers? Similarly, do some apps need sub-second response time while others can wait?</p>
<p>One thing stays the same. If you deploy large QlikView data sets you&#8217;re already sensitive to response times and what to consider when designing an app. In the cloud, smaller apps will need to think about costly chart expressions, messy data models and design choices that work fine on dedicated servers.</p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/481299098" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/12/10/qlikview-in-the-cloud/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/12/10/qlikview-in-the-cloud/</feedburner:origLink></item>
		<item>
		<title>Infobright 3.0.2 Released</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/481017623/</link>
		<comments>http://andpointsbeyond.com/2008/12/10/infobright-302-released/#comments</comments>
		<pubDate>Wed, 10 Dec 2008 22:08:45 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=275</guid>
		<description><![CDATA[Infobright 3.0.2 Released
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.infobright.org/Blog/Entry/ice_302_released/" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://www.infobright.org/Blog/Entry/ice_302_released/');">Infobright 3.0.2 Released</a></p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/481017623" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/12/10/infobright-302-released/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/12/10/infobright-302-released/</feedburner:origLink></item>
		<item>
		<title>QlikView’s Black Friday Analysis</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/467531981/</link>
		<comments>http://andpointsbeyond.com/2008/11/27/qlikviews-black-friday-analysis/#comments</comments>
		<pubDate>Thu, 27 Nov 2008 17:43:18 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=269</guid>
		<description><![CDATA[Search the best deals for tomorrow. Looking for a 50&#8243; plasma TV? Find the brands and stores with the best prices. Thanks to Tom Mackay!
]]></description>
			<content:encoded><![CDATA[<p><a href="http://demo.qlikview.com/AJAX/Black%20Friday/" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://demo.qlikview.com/AJAX/Black%20Friday/');">Search the best deals for tomorrow.</a> Looking for a 50&#8243; plasma TV? Find the brands and stores with the best prices. Thanks to Tom Mackay!</p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/467531981" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/11/27/qlikviews-black-friday-analysis/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/11/27/qlikviews-black-friday-analysis/</feedburner:origLink></item>
		<item>
		<title>Spotfire Unveils Holiday Shopping Guide</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/465269535/</link>
		<comments>http://andpointsbeyond.com/2008/11/25/spotfire-unveils-holiday-shopping-guide/#comments</comments>
		<pubDate>Tue, 25 Nov 2008 17:38:59 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=267</guid>
		<description><![CDATA[An example of their interactive analysis tool. The data is from Amazon Web Services but I&#8217;m pretty sure it is not connected &#8220;live&#8221; in any way. Last I checked, Spotfire loads data as a batch process.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://ondemand.spotfire.com/Public/ViewAnalysis.aspx?file=Public/Holiday%20Shopping%20Guide" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://ondemand.spotfire.com/Public/ViewAnalysis.aspx?file=Public/Holiday%20Shopping%20Guide');">An example of their interactive analysis tool.</a> The data is from Amazon Web Services but I&#8217;m pretty sure it is not connected &#8220;live&#8221; in any way. Last I checked, Spotfire loads data as a batch process.</p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/465269535" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/11/25/spotfire-unveils-holiday-shopping-guide/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/11/25/spotfire-unveils-holiday-shopping-guide/</feedburner:origLink></item>
		<item>
		<title>Peter Batty Discusses One-Second Results And Geospatial Analysis</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/401689280/</link>
		<comments>http://andpointsbeyond.com/2008/09/24/peter-batty-discusses-one-second-results-and-geospatial-analysis/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 10:45:15 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[LinkedIn]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=264</guid>
		<description><![CDATA[Peter Batty discusses one-second results in the world of geospatial data.
The first was that if you can provide information at &#8220;the speed of thought&#8221;, or the speed of a click, this enables people to do interesting things, and work in a different and much more productive way. Google Search is an example - you can [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://geothought.blogspot.com/2008/09/analysis-at-speed-of-thought-and-other.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://geothought.blogspot.com/2008/09/analysis-at-speed-of-thought-and-other.html');">Peter Batty discusses</a> one-second results in the world of geospatial data.</p>
<blockquote><p>The first was that if you can provide information at &#8220;the speed of thought&#8221;, or the speed of a click, this enables people to do interesting things, and work in a different and much more productive way. Google Search is an example - you can ask a question, and you get an answer immediately. The answer may or may not be what you were looking for, but if it isn&#8217;t you can ask a different question. And if you do get a useful answer, it may trigger you to ask additional questions to gain further insight on the question you are investigating.</p></blockquote>
<blockquote><p>A second idea is that when you are looking for insights from business data, the most valuable data is &#8220;on the edges&#8221; - one or two standard deviations away from the mean. This leads to another Netezza philosophy which is that you should have all of your data available and online, all of the time. This is in contrast to the approach which is often taken when you have very large data volumes, where you may work on aggregated data, and/or not keep a lot of historical data, to keep performance at reasonable levels (historical data may be archived offline). In this case of course you may lose the details of the most interesting / valuable data.</p></blockquote>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/401689280" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/09/24/peter-batty-discusses-one-second-results-and-geospatial-analysis/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/09/24/peter-batty-discusses-one-second-results-and-geospatial-analysis/</feedburner:origLink></item>
		<item>
		<title>How One-Second Results Change Everything</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/399883998/</link>
		<comments>http://andpointsbeyond.com/2008/09/22/how-one-second-results-change-everything/#comments</comments>
		<pubDate>Mon, 22 Sep 2008 15:27:30 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[QlikView]]></category>

		<category><![CDATA[business intelligence]]></category>

		<category><![CDATA[data warehouse]]></category>

		<category><![CDATA[interactive analysis]]></category>

		<category><![CDATA[LinkedIn]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=253</guid>
		<description><![CDATA[There&#8217;s a point where query response time is low enough that it changes the analysis game completely. This is the amount of time that a decision maker is willing to wait to get the next answer. Not the first answer, but the next one, and the next one. Eventually the frustration of waiting is worse [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a point where query response time is low enough that it changes the analysis game completely. This is the amount of time that a decision maker is willing to wait to get the <em>next</em> answer. Not the first answer, but the next one, and the next one. Eventually the frustration of waiting is worse than not knowing.</p>
<p>Salesperson: &#8220;What shipped yesterday? Ok, what&#8217;s the breakdown? Woah, what happened in that department? That markdown is too steep, who wrote that order? Which customer? What&#8217;s that rep&#8217;s extension?&#8221;</p>
<p>With one-second results, that analysis would have happened in the time it took you to read it. This is a competition against human nature. One-second results makes the difference between wishing you had the answer and getting it, multiplied over and over throughout the day.</p>
<p>The impact on a business is not from faster queries alone. Behavior changes when decision makers trust that the data is immediately at hand. The relationship to data changes when you can find the answer while you think about it and not lose your train of thought.</p>
<p>Because the query engine can respond to <em>any</em> query in one second, we can make <em>every</em> path of exploration available at the beginning. One application can take the place of many reports. Users can begin to query immediately and along any drill path. The benefit of one-second results is diminished if users have to first identify the report that has the data and filtering options they need.</p>
<p>Can OLAP deliver this? No. We must combine speed of execution with rapid application development, full transaction details, and eliminate predefined drill paths. OLAP/MOLAP/ROLAP/SCHMOLAP can&#8217;t take us into this new era. In-memory associative and column-store databases can.</p>
<p>With one-second results, you don&#8217;t build a query and then start the execution. Instead, the results update as soon as you pick the first filtering option, whether it&#8217;s the day, order number or country of origin. You get immediate feedback before you make your next selection. Also, the filter options can change based on the results. Maybe you remove options that are incompatible with the selections made so far. By shrinking the feedback loop with one-second results, the filtering options can show intelligent behavior to help guide users or add context to the results. This level of dynamism lets users roll back and forth through their ideas. They can cross-reference without losing a train of thought, or discover and follow tangents that are more important.</p>
<p>It&#8217;s not just one decision maker getting an answer quickly. Interactions and processes benefit. Workers get feedback in near-real-time. We can do tricks like running the same query once per second. Ridiculous? This isn&#8217;t paradise, I live in the land of low budgets and &#8220;getting it done&#8221;. Vendor and customer data is available right when they&#8217;re on the phone. Less &#8220;I&#8217;ll get back to you&#8221; and more &#8220;I have that info right in front of me.&#8221; I&#8217;ve also noticed that it&#8217;s harder to bullshit when anyone in the meeting can easily explore the data on their laptop and get the real answer.</p>
<p>In companies where I can deliver one-second results, I spend a lot of time reconditioning people to ask for anything they desire, because now I can put any information at their fingertips, no matter how many tables, how much detail and with little knowledge of how they want to look at the data.</p>
<p>For nearly all companies, the entire transactional database can be copied as-is into a one-second query engine. Add a BI tool on top, rename some fields and identify the table relationships. Time is spent developing the frontend to deliver the best reports and analysis. One person can build the entire solution. Since the transactional model is already validated, there is no data modeling, no formal architecture and little documentation. This might be frightening to enterprises but the benefits are huge for strapped IT budgets.</p>
<p>A one-second query engine needs an interactive frontend to take advantage of it. We also need simpler ETL tools. With the engine in place first, developers will connect the dots and the tools will be built to take advantage of the new abilities.</p>
<p>None of this is theoretical. I&#8217;ve been doing this for the past 7 years with an in-memory associative database, ETL tool and interactive frontend called QlikView. When information flows at the speed of thought, it changes decision-maker behavior and the business process. When we can prototype and deploy one-second query engines quickly, then ideas can be built and tested quickly. <strong>Most ideas won&#8217;t be new or unexpected, but they were impossible or impractical without one-second results.</strong></p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/399883998" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/09/22/how-one-second-results-change-everything/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/09/22/how-one-second-results-change-everything/</feedburner:origLink></item>
		<item>
		<title>InfoBright Open-Source Column-Store DBMS</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/394474907/</link>
		<comments>http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/#comments</comments>
		<pubDate>Tue, 16 Sep 2008 19:24:28 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[QlikView]]></category>

		<category><![CDATA[business software]]></category>

		<category><![CDATA[data warehouse]]></category>

		<category><![CDATA[database]]></category>

		<category><![CDATA[LinkedIn]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=240</guid>
		<description><![CDATA[I wondered if InfoBright would do this. Before going open-source their website described the product as a kind of bulk-storage and not a data warehouse. A place to put data that you need to remain accessible but which you don&#8217;t need to query fast or frequently. That was the enterprise story. As an open-source project, [...]]]></description>
			<content:encoded><![CDATA[<p>I wondered if InfoBright would do this. Before going open-source their website described the product as a kind of bulk-storage and not a data warehouse. A place to put data that you need to remain accessible but which you don&#8217;t need to query fast or frequently. That was the enterprise story. As an open-source project, I think they have a much more compelling value proposition. It&#8217;s the democratization of analysis. Try before you buy (the Enterprise Edition). Rapid prototype / rapid failure. Connects to any SQL tool, platform or language. As easy as working with MySQL.</p>
<p>My test data set is 37 million rows of point-of-sale transactions. Total data size as CSV is 7GB. <strong>My test system stinks.</strong> I need to make that clear so that my numbers are not seen as representative of what&#8217;s possible with InfoBright. After seeing the product in action, I&#8217;m sure that server hardware will do <em>much</em> better.</p>
<p><strong>How fast to bulk load?</strong></p>
<p><a href="http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html');">InfoBright loads are multi-threaded</a>, but my test server is a single-processor desktop and the loads are still fast! With my single processor, about 1.8 million rows/minute (336 MB/min) are being inserted and the load rate slowed down about 10% over 37 million rows. Disk access was minimal as records were inserted. Overall, my little desktop moved an average of 30,000 rows/sec or 5.6 megabytes/sec. <strong>That&#8217;s 20GB/hour! </strong><strong>My processor was fully loaded every second. With faster cores and multi-threading, the load should be much faster.</strong> When I get the chance to load Linux on a bigger box I&#8217;ll be eager to see how it performs.</p>
<p><strong>How big on disk?</strong></p>
<p>I have 7GB of data. Using MySQL&#8217;s default MyISAM storage engine with an 8-bit ASCII representation requires&#8230; 7GB. No surprise there. InfoBright took 591.2MB, as reported from my MySQL management console. <strong>That&#8217;s a 92% reduction in size or a 12:1 compression ratio.</strong></p>
<p>The status data coming from the InfoBright engine includes the storage size of each column and total size of the table. If I could remove the lowest-level detail, InfoBright reports exactly how much space that would save. Helpful.</p>
<p><strong>How much memory?</strong></p>
<p>I don&#8217;t have much guidance because I don&#8217;t have enough data to stress the cache. <strong>My largest data set can fit comfortably inside the compressed cache.</strong> That means every company I&#8217;ve ever dealt with would be able to avoid disk reads and improve performance. Unfortunately, this does not put InfoBright&#8217;s performance on par with other in-memory databases. More on this later.</p>
<p>Here are some guidelines from InfoBright on the memory (in megabytes) that you should allocate given a certain amount of system memory. These figures have no relationship to the size of your data set. I also don&#8217;t know if 32 GB represents an upper limit for the InfoBright software. I suspect the point to this table is that the loader heap does not need to increase and that the compressed heap should increase the fastest but will not exceed the main heap.</p>
<table style="height: 27px;" border="0" width="524">
<tbody>
<tr>
<td># System Memory</td>
<td>Server Main Heap Size</td>
<td>Server Compressed Heap Size</td>
<td>Loader Main Heap</td>
</tr>
<tr>
<td>32GB</td>
<td>24000</td>
<td>4000</td>
<td>800</td>
</tr>
<tr>
<td>16GB</td>
<td>10000</td>
<td>1000</td>
<td>800</td>
</tr>
<tr>
<td>8GB</td>
<td>4000</td>
<td>500</td>
<td>800</td>
</tr>
</tbody>
</table>
<p>ServerMainHeapSize - Size of the main memory heap in the server process, in MB<br />
ServerCompressedHeapSize - Size of the compressed memory heap in the server, in MB.<br />
LoaderMainHeapSize - Size of the memory heap in the loader process, in MB.</p>
<p><strong>Performance?</strong></p>
<p>Is it fast? Slow? My hardware is too restrictive to see what InfoBright can do. All signs are promising. What I can say is that the cache grew over time until MySQL was barely touching the disk. My processor is completely peaked, with 99.8% allocated to the MySQL process. According to <a href="http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html');">this article published by MySQL yesterday</a>, InfoBright queries are (for now) restricted to one CPU core. Performance is dependent on the size of my cache and the speed of each core, two things I have direct control over.</p>
<p>Even with my little desktop testbed, this much is clear: the QlikView in-memory database is <em>much</em> faster. On this dataset I&#8217;d see results in a split-second instead of 30, 60 or 120 seconds. You might think that comparing these two products isn&#8217;t fair, but if your goal is to deliver analysis in SMEs or enterprise departments, these two will definitely compete and complement one another.</p>
<p><strong>Summary?</strong></p>
<p>One of the advantages of column-stores for data warehousing is that simply replicating the original transactional schema can yield adequate performance. Also, there is no performance hit for bringing in the lowest level of granularity. With column-stores, you may not need to build snowflake schemas or do much transformation. Column-stores are therefore less effort to get started in smaller companies with resource-starved IT departments. This means a faster failure rate which is what interests me most: implement quickly, measure early impact and choose investment (InfoBright Enterprise), deferral or elimination.</p>
<p>There is one other free column-store database of significance, <a href="http://monetdb.cwi.nl/" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://monetdb.cwi.nl/');">MonetDB</a>. It&#8217;s an academic project and as such it lacks the toolset and polish that InfoBright inherited from MySQL. I was up and running faster with InfoBright than I was with MonetDB because the installers and administration utilities for InfoBright are already familiar. My Windows tools for MySQL connected right in without a problem. My front-ends with simplified MySQL connectors were oblivious to the InfoBright backend, which is absolutely how it should be.</p>
<p>InfoBright is not without its issues. Documentation is thin or non-existant. I spent hours and hours until I determined (and confirmed on the forums) that the InfoBright loader does not support all of the MySQL syntax for bulk loads. This would not have been such a problem if the error message had provided some warning about my syntax that was perfectly legal in standard MySQL.</p>
<p>All in all, I&#8217;m thrilled to have a no-cost column-store database available for prototyping, quick and dirty applications, and bulk data storage.</p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/394474907" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/</feedburner:origLink></item>
		<item>
		<title>Containers!</title>
		<link>http://feeds.feedburner.com/~r/AndPointsBeyond/~3/388046705/</link>
		<comments>http://andpointsbeyond.com/2008/09/09/containers/#comments</comments>
		<pubDate>Tue, 09 Sep 2008 21:55:58 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
		
		<category><![CDATA[flat earth]]></category>

		<category><![CDATA[globalization]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=232</guid>
		<description><![CDATA[Besides my love of tech, I have an unusual fascination with international trade, container ships, logistics and other big global systems that quietly hum along day and night. There&#8217;s a very nice render of a shipping container on my business cards.
I&#8217;m thrilled to stumble upon the blog of Ethan Zuckerman and his mention of a [...]]]></description>
			<content:encoded><![CDATA[<p>Besides my love of tech, I have an unusual fascination with international trade, container ships, logistics and other big global systems that quietly hum along day and night. There&#8217;s a very nice render of a shipping container on my business cards.</p>
<p>I&#8217;m thrilled to stumble upon <a href="http://www.ethanzuckerman.com/blog/2008/09/08/mapping-a-connected-world/" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://www.ethanzuckerman.com/blog/2008/09/08/mapping-a-connected-world/');">the blog of Ethan Zuckerman</a> and his mention of <a href="http://news.bbc.co.uk/2/hi/in_depth/business/2008/the_box/default.stm" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://news.bbc.co.uk/2/hi/in_depth/business/2008/the_box/default.stm');">a new BBC series &#8220;The Box&#8221; </a>that is tracking a cargo container around the world for a year.</p>
<p><a href="http://news.bbc.co.uk/2/hi/in_depth/629/629/7600053.stm" onclick="javascript:pageTracker._trackPageview('/outbound/article/http://news.bbc.co.uk/2/hi/in_depth/629/629/7600053.stm');">What is the current location and load of the box?</a></p>
<p>UPDATE: That link really is the current location. I was expecting to be disappointed, but since I first saw the page, it has indeed been migrating across the map toward its very first load, of whiskey.</p>
<img src="http://feeds.feedburner.com/~r/AndPointsBeyond/~4/388046705" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/09/09/containers/feed/</wfw:commentRss>
		<feedburner:origLink>http://andpointsbeyond.com/2008/09/09/containers/</feedburner:origLink></item>
	</channel>
</rss>
