<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>... and points beyond &#187; Database</title>
	<atom:link href="http://andpointsbeyond.com/category/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://andpointsbeyond.com</link>
	<description>mostly about data</description>
	<lastBuildDate>Thu, 19 Jan 2012 23:29:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Vertica for the Cloud</title>
		<link>http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/</link>
		<comments>http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/#comments</comments>
		<pubDate>Fri, 12 Dec 2008 06:36:01 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Vertica]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' addthis:title='Vertica for the Cloud '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>While I have my head in the clouds, I should mention that Vertica has a cloud solution that they manage for you. Not new, but gives some perspective. With competitive offerings in the $10-20k per terabyte, this is an attractive &#8230; <a href="http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' addthis:title='Vertica for the Cloud ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' rel='bookmark' title='More on Vertica'>More on Vertica</a></li>
<li><a href='http://andpointsbeyond.com/2007/02/16/whats-vertica/' rel='bookmark' title='What’s Vertica?'>What’s Vertica?</a></li>
<li><a href='http://andpointsbeyond.com/2008/12/10/qlikview-in-the-cloud/' rel='bookmark' title='QlikView in the Cloud'>QlikView in the Cloud</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' addthis:title='Vertica for the Cloud '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>While I have my head in the clouds, I should mention that Vertica has a cloud solution that they manage for you. Not new, but gives some perspective.</p>
<p>With competitive offerings in the $10-20k per terabyte, this is an attractive offer and a great way to try before you invest when you have that much data.</p>
<p>I hear Vertica is a screamer, but I can&#8217;t imagine getting sub-second results for 3 TB of data on 3 virtualized servers, for the same reasons I gave in my previous post.</p>
<p><a href="http://www.vertica.com/_pdf/verticacloudpricing">Vertica for the Cloud Pricing </a></p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' addthis:title='Vertica for the Cloud ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' rel='bookmark' title='More on Vertica'>More on Vertica</a></li>
<li><a href='http://andpointsbeyond.com/2007/02/16/whats-vertica/' rel='bookmark' title='What’s Vertica?'>What’s Vertica?</a></li>
<li><a href='http://andpointsbeyond.com/2008/12/10/qlikview-in-the-cloud/' rel='bookmark' title='QlikView in the Cloud'>QlikView in the Cloud</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Infobright 3.0.2 Released</title>
		<link>http://andpointsbeyond.com/2008/12/10/infobright-302-released/</link>
		<comments>http://andpointsbeyond.com/2008/12/10/infobright-302-released/#comments</comments>
		<pubDate>Wed, 10 Dec 2008 22:08:45 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Database]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=275</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/12/10/infobright-302-released/' addthis:title='Infobright 3.0.2 Released '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>Infobright 3.0.2 Released Related posts: InfoBright Open-Source Column-Store DBMS<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/12/10/infobright-302-released/' addthis:title='Infobright 3.0.2 Released ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' rel='bookmark' title='InfoBright Open-Source Column-Store DBMS'>InfoBright Open-Source Column-Store DBMS</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/12/10/infobright-302-released/' addthis:title='Infobright 3.0.2 Released '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p><a href="http://www.infobright.org/Blog/Entry/ice_302_released/">Infobright 3.0.2 Released</a></p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/12/10/infobright-302-released/' addthis:title='Infobright 3.0.2 Released ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' rel='bookmark' title='InfoBright Open-Source Column-Store DBMS'>InfoBright Open-Source Column-Store DBMS</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/12/10/infobright-302-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>InfoBright Open-Source Column-Store DBMS</title>
		<link>http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/</link>
		<comments>http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/#comments</comments>
		<pubDate>Tue, 16 Sep 2008 19:24:28 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[QlikView]]></category>
		<category><![CDATA[business software]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=240</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' addthis:title='InfoBright Open-Source Column-Store DBMS '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>I wondered if InfoBright would do this. Before going open-source their website described the product as a kind of bulk-storage and not a data warehouse. A place to put data that you need to remain accessible but which you don&#8217;t &#8230; <a href="http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' addthis:title='InfoBright Open-Source Column-Store DBMS ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' rel='bookmark' title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time'>Low-Cost Data Analysis &#038; Visualization: It’s Getting Better All The Time</a></li>
<li><a href='http://andpointsbeyond.com/2012/01/19/open-source-qlikview-engine/' rel='bookmark' title='Open-Source QlikView Engine?'>Open-Source QlikView Engine?</a></li>
<li><a href='http://andpointsbeyond.com/2008/12/10/infobright-302-released/' rel='bookmark' title='Infobright 3.0.2 Released'>Infobright 3.0.2 Released</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' addthis:title='InfoBright Open-Source Column-Store DBMS '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>I wondered if InfoBright would do this. Before going open-source their website described the product as a kind of bulk-storage and not a data warehouse. A place to put data that you need to remain accessible but which you don&#8217;t need to query fast or frequently. That was the enterprise story. As an open-source project, I think they have a much more compelling value proposition. It&#8217;s the democratization of analysis. Try before you buy (the Enterprise Edition). Rapid prototype / rapid failure. Connects to any SQL tool, platform or language. As easy as working with MySQL.</p>
<p>My test data set is 37 million rows of point-of-sale transactions. Total data size as CSV is 7GB. <strong>My test system stinks.</strong> I need to make that clear so that my numbers are not seen as representative of what&#8217;s possible with InfoBright. After seeing the product in action, I&#8217;m sure that server hardware will do <em>much</em> better.</p>
<p><strong>How fast to bulk load?</strong></p>
<p><a href="http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html">InfoBright loads are multi-threaded</a>, but my test server is a single-processor desktop and the loads are still fast! With my single processor, about 1.8 million rows/minute (336 MB/min) are being inserted and the load rate slowed down about 10% over 37 million rows. Disk access was minimal as records were inserted. Overall, my little desktop moved an average of 30,000 rows/sec or 5.6 megabytes/sec. <strong>That&#8217;s 20GB/hour! </strong><strong>My processor was fully loaded every second. With faster cores and multi-threading, the load should be much faster.</strong> When I get the chance to load Linux on a bigger box I&#8217;ll be eager to see how it performs.</p>
<p><strong>How big on disk?</strong></p>
<p>I have 7GB of data. Using MySQL&#8217;s default MyISAM storage engine with an 8-bit ASCII representation requires&#8230; 7GB. No surprise there. InfoBright took 591.2MB, as reported from my MySQL management console. <strong>That&#8217;s a 92% reduction in size or a 12:1 compression ratio.</strong></p>
<p>The status data coming from the InfoBright engine includes the storage size of each column and total size of the table. If I could remove the lowest-level detail, InfoBright reports exactly how much space that would save. Helpful.</p>
<p><strong>How much memory?</strong></p>
<p>I don&#8217;t have much guidance because I don&#8217;t have enough data to stress the cache. <strong>My largest data set can fit comfortably inside the compressed cache.</strong> That means every company I&#8217;ve ever dealt with would be able to avoid disk reads and improve performance. Unfortunately, this does not put InfoBright&#8217;s performance on par with other in-memory databases. More on this later.</p>
<p>Here are some guidelines from InfoBright on the memory (in megabytes) that you should allocate given a certain amount of system memory. These figures have no relationship to the size of your data set. I also don&#8217;t know if 32 GB represents an upper limit for the InfoBright software. I suspect the point to this table is that the loader heap does not need to increase and that the compressed heap should increase the fastest but will not exceed the main heap.</p>
<table style="height: 27px;" border="0" width="524">
<tbody>
<tr>
<td># System Memory</td>
<td>Server Main Heap Size</td>
<td>Server Compressed Heap Size</td>
<td>Loader Main Heap</td>
</tr>
<tr>
<td>32GB</td>
<td>24000</td>
<td>4000</td>
<td>800</td>
</tr>
<tr>
<td>16GB</td>
<td>10000</td>
<td>1000</td>
<td>800</td>
</tr>
<tr>
<td>8GB</td>
<td>4000</td>
<td>500</td>
<td>800</td>
</tr>
</tbody>
</table>
<p>ServerMainHeapSize &#8211; Size of the main memory heap in the server process, in MB<br />
ServerCompressedHeapSize &#8211; Size of the compressed memory heap in the server, in MB.<br />
LoaderMainHeapSize &#8211; Size of the memory heap in the loader process, in MB.</p>
<p><strong>Performance?</strong></p>
<p>Is it fast? Slow? My hardware is too restrictive to see what InfoBright can do. All signs are promising. What I can say is that the cache grew over time until MySQL was barely touching the disk. My processor is completely peaked, with 99.8% allocated to the MySQL process. According to <a href="http://dev.mysql.com/tech-resources/articles/datawarehousing_mysql_infobright.html">this article published by MySQL yesterday</a>, InfoBright queries are (for now) restricted to one CPU core. Performance is dependent on the size of my cache and the speed of each core, two things I have direct control over.</p>
<p>Even with my little desktop testbed, this much is clear: the QlikView in-memory database is <em>much</em> faster. On this dataset I&#8217;d see results in a split-second instead of 30, 60 or 120 seconds. You might think that comparing these two products isn&#8217;t fair, but if your goal is to deliver analysis in SMEs or enterprise departments, these two will definitely compete and complement one another.</p>
<p><strong>Summary?</strong></p>
<p>One of the advantages of column-stores for data warehousing is that simply replicating the original transactional schema can yield adequate performance. Also, there is no performance hit for bringing in the lowest level of granularity. With column-stores, you may not need to build snowflake schemas or do much transformation. Column-stores are therefore less effort to get started in smaller companies with resource-starved IT departments. This means a faster failure rate which is what interests me most: implement quickly, measure early impact and choose investment (InfoBright Enterprise), deferral or elimination.</p>
<p>There is one other free column-store database of significance, <a href="http://monetdb.cwi.nl/">MonetDB</a>. It&#8217;s an academic project and as such it lacks the toolset and polish that InfoBright inherited from MySQL. I was up and running faster with InfoBright than I was with MonetDB because the installers and administration utilities for InfoBright are already familiar. My Windows tools for MySQL connected right in without a problem. My front-ends with simplified MySQL connectors were oblivious to the InfoBright backend, which is absolutely how it should be.</p>
<p>InfoBright is not without its issues. Documentation is thin or non-existant. I spent hours and hours until I determined (and confirmed on the forums) that the InfoBright loader does not support all of the MySQL syntax for bulk loads. This would not have been such a problem if the error message had provided some warning about my syntax that was perfectly legal in standard MySQL.</p>
<p>All in all, I&#8217;m thrilled to have a no-cost column-store database available for prototyping, quick and dirty applications, and bulk data storage.</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' addthis:title='InfoBright Open-Source Column-Store DBMS ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' rel='bookmark' title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time'>Low-Cost Data Analysis &#038; Visualization: It’s Getting Better All The Time</a></li>
<li><a href='http://andpointsbeyond.com/2012/01/19/open-source-qlikview-engine/' rel='bookmark' title='Open-Source QlikView Engine?'>Open-Source QlikView Engine?</a></li>
<li><a href='http://andpointsbeyond.com/2008/12/10/infobright-302-released/' rel='bookmark' title='Infobright 3.0.2 Released'>Infobright 3.0.2 Released</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time</title>
		<link>http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/</link>
		<comments>http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/#comments</comments>
		<pubDate>Mon, 08 Sep 2008 02:43:45 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Interactive Analysis]]></category>
		<category><![CDATA[QlikView]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[emerging technology]]></category>
		<category><![CDATA[MPP]]></category>
		<category><![CDATA[Tableau]]></category>
		<category><![CDATA[Vertica]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=213</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' addthis:title='Low-Cost Data Analysis &#38; Visualization: It’s Getting Better All The Time '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>Over the weekend I have revisited Tableau, enjoyed some success with MonetDB, tried to turn MySQL into a hundred million row data warehouse, been underwhelmed with Firebird, installed Greenplum and spent many frustrated hours with Talend Open Studio, Pentaho Kettle &#8230; <a href="http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' addthis:title='Low-Cost Data Analysis &#38; Visualization: It’s Getting Better All The Time ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2007/07/01/interactive-information-visualization/' rel='bookmark' title='Interactive Information Visualization'>Interactive Information Visualization</a></li>
<li><a href='http://andpointsbeyond.com/2006/10/31/us-army-tests-real-time-3d-visualization-of-on-the-move-data/' rel='bookmark' title='U.S. Army Tests Real-Time 3D Visualization Of On-The-Move Data'>U.S. Army Tests Real-Time 3D Visualization Of On-The-Move Data</a></li>
<li><a href='http://andpointsbeyond.com/2008/09/22/how-one-second-results-change-everything/' rel='bookmark' title='How One-Second Results Change Everything'>How One-Second Results Change Everything</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' addthis:title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>Over the weekend I have revisited <a href="http://tableausoftware.com/">Tableau</a>, enjoyed some success with <a href="http://monetdb.cwi.nl/">MonetDB</a>, tried to turn <a href="http://dev.mysql.com/">MySQL</a> into a hundred million row data warehouse, been underwhelmed with <a href="http://www.firebirdsql.org/">Firebird</a>, installed <a href="http://www.greenplum.com/">Greenplum</a> and spent many frustrated hours with <a href="http://www.talend.com/index.php">Talend Open Studio</a>, <a href="http://kettle.pentaho.org/">Pentaho Kettle</a> and <a href="http://www.jitterbit.com/">Jitterbit</a>.</p>
<p>Of course, I could just buy <a href="http://qlikview.com/home.aspx?LangType=1033">QlikView</a>, but what can be done for less $money? Unfortunately data warehouses and BI front-ends are not sexy problems in the opensource community. <a href="http://www.collegeathome.com/blog/2008/06/05/50-cool-things-you-can-do-with-google-charts-api/">Graphs and charts</a> get a little more attention, but you&#8217;ll need to write your own code to glue them to your application.</p>
<p><strong>In summary, what can I say about our options?</strong></p>
<p>First, write your own ETL. Why do opensource ETL tools like Talend and Kettle work so hard to rebuild <a href="http://www.informatica.com/Pages/index.aspx">Informatica</a>? It reminds me of Linux in the 1990s when the community wanted to beat Windows and kept working to look like Windows and wondering when victory would arrive. Informatica, like OLAP and mainframes, is from an era when memory was scarce; languages were low-level, slow to compile &amp; run, abstracted little and were not at all portable. On top of that, ODBC drivers were tightly controlled and costly.</p>
<p>But now we can pick from many great scripting languages. Today&#8217;s languages abstract the hard parts, are easy to read, can be edited while executing and talk to any system, database, web service or application. I think the next direction for ETL will be a simple (but extensible) transformation language using an ORM wrapper&#8230; Rails on ETL. Until that arrives, you can achieve everything you need with PHP, Perl, Ruby and others.</p>
<p><strong>Best option for low-cost data warehouse?</strong></p>
<p><span id="more-213"></span></p>
<p>Check out the totally free <a href="http://monetdb.cwi.nl/">MonetDB</a>. Unless <a href="http://www.vertica.com/">Vertica</a> or <a href="http://www.infobright.com/">InfoBright</a> reconsiders releasing a low/no cost option, MonetDB will likely mature to become a first-choice column-store database. It&#8217;s an academic project that has earned a sizeable development community and user base. The product is functional today for tens of millions of rows (maybe more). So far I have personally worked with a few million rows in MonetDB and I&#8217;d like to use it again. With a little focus on usability and packaging, it could be a contender.</p>
<p>Greenplum, freely available for development, won&#8217;t help. The architecture is designed around Massively Parallel Processing. As a single, standalone installation, it&#8217;s basically just PostgreSQL. You won&#8217;t see extra performance without a farm of servers.</p>
<p>To my surprise, MySQL itself is not too bad. The MyISAM tables are speedy and <a href="http://tomictech.com/2008/06/16/building-a-data-warehouse-on-a-budget-with-mysql-51/">Alex Tomic wrote a post </a>about using multiple queries against the Archive storage engine and how to steal an index with that engine. With basic MyISAM on a fast server, I&#8217;m running 10GB table scans in under a minute, but moderate aggregations take a few minutes. Architecturally, MySQL is limited. One query = one thread = one core. Running two simultaneous queries is an option, but MySQL still would not do the kind of transparent, optimized caching that you need for a warehouse. Throughput is limited to disk I/O speed. InfoBright has built a column-store storage engine for MySQL but it&#8217;s targeted for the enterprise only.</p>
<p><strong>What about the front end?</strong></p>
<p>For the money and quality and ease of integration, it&#8217;s hard to beat <a href="http://tableausoftware.com/">Tableau</a>. $1800 bucks isn&#8217;t cheap, but for a small business that truly needs to analyze patterns, this will do the job and it makes very pretty charts. The most recent version has integrated support for mapping based on zip code, area code, state, country and others. The maps also incorporate Census and USGS data and are pulled live from an online source. They look great! Tableau has always had a smooth, easy-to-understand layout and a crisp look that makes each chart very attractive in a presentation. It also automatically guesses what chart you want based on the quality &amp; number of aggregates and dimensions.</p>
<p>The drawback is that Tableau doesn&#8217;t have its own high-speed database or ETL tool. Tableau can&#8217;t shine until a low/no-cost read-optimized database is available. Until then, it does support the most common databases and data warehouses, both commercial and open-source. Except it can&#8217;t handle generic ODBC and I don&#8217;t know why.</p>
<p>There&#8217;s <a href="http://www.jaspersoft.com/">JasperSoft</a> = CrystalReports + OLAP + Informatica + Web Dashboards. Each component is from a different opensource project, so they don&#8217;t all use the same platform or interface, and they can&#8217;t all read the same data sources. The democratization of BI is NOT going to come from enterprise tools made cheap; it will come from simple disruptive tools that add new ideas and polish with each release. Sorry, Jasper.</p>
<p><strong>What would I use to build a reporting system for a smaller business?</strong></p>
<p>Well, assuming we&#8217;re doing it to make more money, not to keep up appearances, the best choice is still to pay the money for QlikView. It reads ODBC, OLE DB, text files and Excel&#8211;everything a business needs. The ETL language is easy to understand for any businessperson that has put together an Access database or enjoys Excel formulas (blech!). The GUI front-end designer is powerful &amp; straightforward. And the in-memory database behind QlikView is so incredibly fast that I routinely analyze 10 million of rows in a split-second. It&#8217;s a one-stop shop.</p>
<p>Tableau is a good option but you lose the database and ETL. Maybe you don&#8217;t have a large volume of data or maybe it&#8217;s all in one view in the database&#8211;Tableau could work for you.</p>
<p>At a lower cost? Well, it definitely comes down to tradeoffs in coder skill, money, development time and ease of use. Whereas in QlikView anyone can write the basic code to read a couple tables, all other solutions demand heavy lifting somehwere.</p>
<p><strong>If I was doing it for free?</strong></p>
<p>I&#8217;d start with PHP, and possibly Ruby. Read from a database, calculate, generate Google Charts, and maybe use one of the <a href="http://www.maani.us/xml_charts/">low/no-cost Flash-based charting libraries for interactive splash</a>. In a future post I&#8217;d like to cover ORMs and Google Chart APIs and how it can help get these projects off and running quickly.</p>
<p>Got any ideas? I&#8217;m always on the lookout for a faster cheaper better way to create these solutions.</p>
<p><a href="http://www.collegeathome.com/blog/2008/06/05/50-cool-things-you-can-do-with-google-charts-api/">50 Cool Things You Can Do with Google Charts</a></p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' addthis:title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2007/07/01/interactive-information-visualization/' rel='bookmark' title='Interactive Information Visualization'>Interactive Information Visualization</a></li>
<li><a href='http://andpointsbeyond.com/2006/10/31/us-army-tests-real-time-3d-visualization-of-on-the-move-data/' rel='bookmark' title='U.S. Army Tests Real-Time 3D Visualization Of On-The-Move Data'>U.S. Army Tests Real-Time 3D Visualization Of On-The-Move Data</a></li>
<li><a href='http://andpointsbeyond.com/2008/09/22/how-one-second-results-change-everything/' rel='bookmark' title='How One-Second Results Change Everything'>How One-Second Results Change Everything</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Improving The Load Process With Multiple ODBC Connections</title>
		<link>http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/</link>
		<comments>http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/#comments</comments>
		<pubDate>Fri, 15 Aug 2008 07:54:24 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[QlikView]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=193</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/' addthis:title='Improving The Load Process With Multiple ODBC Connections '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>One of the most useful tricks shared at the QlikView conference was from Nik Boman on improving the data extraction from databases. ODBC is a slow protocol, running orders of magnitude slower than the database or a typical Ethernet connection. &#8230; <a href="http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/' addthis:title='Improving The Load Process With Multiple ODBC Connections ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
No related posts.]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/' addthis:title='Improving The Load Process With Multiple ODBC Connections '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>One of the most useful tricks shared at the QlikView conference was from Nik Boman on improving the data extraction from databases.</p>
<p>ODBC is a slow protocol, running orders of magnitude slower than the database or a typical Ethernet connection. Very pricey ETL tools for data warehousing get around this by extracting through multiple connections to the database, and there&#8217;s no reason that a QlikView infrastructure can&#8217;t take advantage of it.</p>
<p>For example, run two copies of QlikView at the same time and extract approximately half of the data set with each. First, make a copy of the QV.exe file and give it a unique name. You can open QV.exe and your unique copy at the same time. You can run three or more copies of QlikView with this method.</p>
<p>Next, decide how to divide your data set; it could be based on date, country, state, or half the alphabet, for example. What you want is to divide the data set into roughly equal segments, one for each copy of QlikView.</p>
<p>How does each copy of QlikView know which segment to load? One way to do this dynamically is to use the command-line to set a variable in the script. Reference this variable in the SQL SELECT statement in the script: <strong>WHERE YearField=$(vYearVariable)</strong>. See the reference manual for command-line options.</p>
<p>Your mileage will vary. Some databases don&#8217;t do much better with simultaneous ODBC reads. Oracle does quite well.</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/' addthis:title='Improving The Load Process With Multiple ODBC Connections ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2008/08/15/improving-the-load-process-with-multiple-odbc-connections/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How well do Netezza, Greenplum, Vertica and others handle 12-way joins?</title>
		<link>http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/</link>
		<comments>http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/#comments</comments>
		<pubDate>Mon, 26 Nov 2007 08:50:57 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' addthis:title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins? '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>In my world, which is corporate software systems, I have a transactional database that is usually in second normal form and has very few aggregates. Building reports directly means joining at least 4 tables, often 8, and sometimes as many &#8230; <a href="http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' addthis:title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins? ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' rel='bookmark' title='Vertica for the Cloud'>Vertica for the Cloud</a></li>
<li><a href='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' rel='bookmark' title='More on Vertica'>More on Vertica</a></li>
<li><a href='http://andpointsbeyond.com/2007/02/16/whats-vertica/' rel='bookmark' title='What’s Vertica?'>What’s Vertica?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' addthis:title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins? '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>In my world, which is corporate software systems, I have a transactional database that is usually in second normal form and has very few aggregates. Building reports directly means joining at least 4 tables, often 8, and sometimes as many as 12. Unfortunately, the new crop of data warehouse vendors have made it very difficult to grasp how well they handle this. Some of these products handle your datamodel as-is, and some expect star/snowflake schemas, which adds a layer of design, coding, testing, validation and additional maintenance.</p>
<p>Netezza, Greenplum and Vertica all use off-the-shelf interconnects, meaning 1 gigabit ethernet in most cases. Transferring large amounts of data from a distributed system over ethernet can easily unravel any gains. In a simplistic design, an evenly distributed dataset would require that every node talks to every other node. With multiple joins, this would create a series of bottlenecks. It would also rely heavily on synchronization across the distributed system.</p>
<p><a href="http://www.mit.edu/%7Edna/vldb.pdf">Vertica is a star/snowflake product</a>. The Vertica distributed system replicates the dimension tables on each node and partitions the fact table. <a href="http://www.dbms2.com/2007/10/23/vertica-star-snowflake-schema/">Vertica says that they have customers that use more transactional models</a>, but what does that mean for overall performance? Greenplum&#8217;s website says: &#8220;Utilizes pipelining techniques and redistributes data among nodes for high performance execution of complex joins.&#8221; Encouraging, but what is considered &#8220;complex&#8221; and what will this do to my network in real-world conditions?</p>
<p>If you have any thoughts to share, please add them to the comments.</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' addthis:title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins? ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' rel='bookmark' title='Vertica for the Cloud'>Vertica for the Cloud</a></li>
<li><a href='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' rel='bookmark' title='More on Vertica'>More on Vertica</a></li>
<li><a href='http://andpointsbeyond.com/2007/02/16/whats-vertica/' rel='bookmark' title='What’s Vertica?'>What’s Vertica?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>2007 Magic Quadrant for Data Warehouse</title>
		<link>http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/</link>
		<comments>http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/#comments</comments>
		<pubDate>Sun, 28 Oct 2007 18:58:25 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/' addthis:title='2007 Magic Quadrant for Data Warehouse '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>Gartner released the updated quadrant for DW DBMS software and appliances. DATAllegro seems too far below Netezza in ability to execute. DATAllegro has large, proven installations. Their recent releases run on Dell blades with EMC storage instead of the customized &#8230; <a href="http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/' addthis:title='2007 Magic Quadrant for Data Warehouse ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2006/11/17/data-warehousing-trends-for-the-large-enterprise-for-2007/' rel='bookmark' title='Data Warehousing Trends (For The Large Enterprise) For 2007'>Data Warehousing Trends (For The Large Enterprise) For 2007</a></li>
<li><a href='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' rel='bookmark' title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins?'>How well do Netezza, Greenplum, Vertica and others handle 12-way joins?</a></li>
<li><a href='http://andpointsbeyond.com/2006/10/12/100-terrabyte-capacity-data-warehouse/' rel='bookmark' title='100 Terrabyte (Capacity) Data Warehouse'>100 Terrabyte (Capacity) Data Warehouse</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/' addthis:title='2007 Magic Quadrant for Data Warehouse '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>Gartner <a href="http://mediaproducts.gartner.com/reprints/microsoft/article19/article19.html">released the updated quadrant</a> for DW DBMS software and appliances. DATAllegro seems too far below Netezza in ability to execute. DATAllegro has large, proven installations. Their recent releases run on Dell blades with EMC storage instead of the customized FPGAs of Netezza. And how is Greenplum rated higher than DATAllego? (via <a href="http://www.dbms2.com/2007/10/19/gartner-2007-magic-quadrant-for-data-warehouse-database-management-systems/">DBMS2</a>)</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/' addthis:title='2007 Magic Quadrant for Data Warehouse ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2006/11/17/data-warehousing-trends-for-the-large-enterprise-for-2007/' rel='bookmark' title='Data Warehousing Trends (For The Large Enterprise) For 2007'>Data Warehousing Trends (For The Large Enterprise) For 2007</a></li>
<li><a href='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' rel='bookmark' title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins?'>How well do Netezza, Greenplum, Vertica and others handle 12-way joins?</a></li>
<li><a href='http://andpointsbeyond.com/2006/10/12/100-terrabyte-capacity-data-warehouse/' rel='bookmark' title='100 Terrabyte (Capacity) Data Warehouse'>100 Terrabyte (Capacity) Data Warehouse</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2007/10/28/2007-magic-quadrant-for-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>TeraData Performance at KiloData Prices</title>
		<link>http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/</link>
		<comments>http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/#comments</comments>
		<pubDate>Tue, 07 Aug 2007 20:46:21 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=130</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/' addthis:title='TeraData Performance at KiloData Prices '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>What if you could turn on a massively parallel business intelligence database cluster with a few lines of code? What if you could leverage in-house and outsourced resources for computation and storage as needed? What if you could expand your &#8230; <a href="http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/' addthis:title='TeraData Performance at KiloData Prices ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' rel='bookmark' title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time'>Low-Cost Data Analysis &#038; Visualization: It’s Getting Better All The Time</a></li>
<li><a href='http://andpointsbeyond.com/2006/10/09/data-warehousing-in-education/' rel='bookmark' title='Data Warehousing In Education'>Data Warehousing In Education</a></li>
<li><a href='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' rel='bookmark' title='InfoBright Open-Source Column-Store DBMS'>InfoBright Open-Source Column-Store DBMS</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/' addthis:title='TeraData Performance at KiloData Prices '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>What if you could turn on a massively parallel business intelligence database cluster with a few lines of code? What if you could leverage in-house and outsourced resources for computation and storage as needed? What if you could expand your analysis, data mining and text-search effort one node at a time, transparently, instantly?</p>
<p>There&#8217;s been a flurry of discussion around <a href="http://lucene.apache.org/hadoop/">Hadoop</a> and the <a href="http://wiki.apache.org/lucene-hadoop/Hbase">Hbase project</a> to bring Google&#8217;s <a href="http://labs.google.com/papers/bigtable.html">BigTable</a> feature to Hadoop.</p>
<p>Now Amazon wants to talk about <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873">how to use Hadoop with EC2 and S3</a>, their computing and storage clusters.</p>
<p>Can I search large volumes of data on the cheap? Yes, but my algorithms must fit within the <a href="http://labs.google.com/papers/mapreduce.html">MapReduce framework</a>.</p>
<p>Does someone have a MapReduce-enabled data query language? Well, there&#8217;s <a href="http://research.yahoo.com/project/pig">Pig</a> from Yahoo. <a href="http://labs.google.com/papers/sawzall.html">Sawzall</a> from Google. <a href="http://glinden.blogspot.com/2007/04/yahoo-pig-and-google-sawzall.html">Here is a discussion comparing those two </a>from Greg Linden. <a href="http://www.google.com/search?q=abacus+hadoop&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official">Abacus</a> from the Hadoop project. Apparently Microsoft has <a href="http://research.microsoft.com/research/sv/DryadLINQ/">DryadLINQ</a>.</p>
<p>We are on the exponential curve as it swoops upward dramatically. From the power and flexibility of opensource, anyone can use Google secret sauce on Amazon&#8217;s computers for 18 cents per gigabyte and 10 cents per computing hour.</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/' addthis:title='TeraData Performance at KiloData Prices ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' rel='bookmark' title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time'>Low-Cost Data Analysis &#038; Visualization: It’s Getting Better All The Time</a></li>
<li><a href='http://andpointsbeyond.com/2006/10/09/data-warehousing-in-education/' rel='bookmark' title='Data Warehousing In Education'>Data Warehousing In Education</a></li>
<li><a href='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' rel='bookmark' title='InfoBright Open-Source Column-Store DBMS'>InfoBright Open-Source Column-Store DBMS</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2007/08/07/teradata-performance-at-kilodata-prices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Heartbeat</title>
		<link>http://andpointsbeyond.com/2007/03/28/heartbeat/</link>
		<comments>http://andpointsbeyond.com/2007/03/28/heartbeat/#comments</comments>
		<pubDate>Wed, 28 Mar 2007 16:38:00 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[QlikView]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[Spotfire]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=4</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/03/28/heartbeat/' addthis:title='Heartbeat '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>I am glad to hear in a presentation from Vertica, that they will be releasing their product for free use under a certain data set size. I do not know if this is intended to distinguish developers from production systems &#8230; <a href="http://andpointsbeyond.com/2007/03/28/heartbeat/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/03/28/heartbeat/' addthis:title='Heartbeat ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' rel='bookmark' title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time'>Low-Cost Data Analysis &#038; Visualization: It’s Getting Better All The Time</a></li>
<li><a href='http://andpointsbeyond.com/2007/07/01/interactive-information-visualization/' rel='bookmark' title='Interactive Information Visualization'>Interactive Information Visualization</a></li>
<li><a href='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' rel='bookmark' title='InfoBright Open-Source Column-Store DBMS'>InfoBright Open-Source Column-Store DBMS</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/03/28/heartbeat/' addthis:title='Heartbeat '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p>I am glad to hear in a presentation from <a href="http://www.vertica.com/">Vertica</a>, that they will be releasing their product for free use under a certain data set size. I do not know if this is intended to distinguish developers from production systems or so that smaller companies can run the product for free (and help establish a user base).</p>
<p>Also, I am evaluating <a href="http://spotfire.com/">Spotfire DXP</a> as well as the upcoming features of <a href="http://www.qlikview.com/">QlikView 8</a>. I&#8217;ll post a review of both when time and/or NDAs permit.</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/03/28/heartbeat/' addthis:title='Heartbeat ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2008/09/07/low-cost-data-analysis-visualization-its-getting-better-all-the-time/' rel='bookmark' title='Low-Cost Data Analysis &amp; Visualization: It’s Getting Better All The Time'>Low-Cost Data Analysis &#038; Visualization: It’s Getting Better All The Time</a></li>
<li><a href='http://andpointsbeyond.com/2007/07/01/interactive-information-visualization/' rel='bookmark' title='Interactive Information Visualization'>Interactive Information Visualization</a></li>
<li><a href='http://andpointsbeyond.com/2008/09/16/infobright-open-source-column-store-dbms/' rel='bookmark' title='InfoBright Open-Source Column-Store DBMS'>InfoBright Open-Source Column-Store DBMS</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2007/03/28/heartbeat/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More on Vertica</title>
		<link>http://andpointsbeyond.com/2007/02/16/more-on-vertica/</link>
		<comments>http://andpointsbeyond.com/2007/02/16/more-on-vertica/#comments</comments>
		<pubDate>Sat, 17 Feb 2007 03:34:00 +0000</pubDate>
		<dc:creator>Jay Jakosky</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Vertica]]></category>

		<guid isPermaLink="false">http://andpointsbeyond.com/?p=109</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' addthis:title='More on Vertica '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div>The ideas in this paper will be incorporated into the Vertica database product. And unfortunately it won&#8217;t be open source. At least that&#8217;s what one company employee commented on Slashdot. In the same way that RAID design options (e.g. 1, &#8230; <a href="http://andpointsbeyond.com/2007/02/16/more-on-vertica/">Continue reading <span class="meta-nav">&#8594;</span></a><div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' addthis:title='More on Vertica ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2007/02/16/whats-vertica/' rel='bookmark' title='What’s Vertica?'>What’s Vertica?</a></li>
<li><a href='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' rel='bookmark' title='Vertica for the Cloud'>Vertica for the Cloud</a></li>
<li><a href='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' rel='bookmark' title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins?'>How well do Netezza, Greenplum, Vertica and others handle 12-way joins?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' addthis:title='More on Vertica '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_counter addthis_pill_style"></a></div><p><span style="font-family:georgia;">The ideas </span><a href="http://www.mit.edu/%7Edna/vldb.pdf">in this paper</a><span style="font-family:georgia;"> will be incorporated into the Vertica database product. And unfortunately it won&#8217;t be open source. At least that&#8217;s what one company employee commented on Slashdot.</span></p>
<p><span style="font-family:georgia;">In the same way that RAID design options (e.g. 1, 5 and 10) can accommodate multiple drive failures, the Vertica system will distribute the same slice of the database to several servers. A grid of commodity hardware can act as a high-availability system and Vertica&#8217;s shared-nothing architecture enables this feature without complex design or execution.</span></p>
<blockquote><p>We call a system that tolerates K failures K-safe. C-Store will be configurable to support a range of values of K.</p></blockquote>
<p><span style="font-family:georgia;">Inserts and updates are performed on a separate data store and merged in batches. Deletes are marked with bitmasks. Rather than building a complex locking scheme for grid members, data in the read-only and write stores is stamped with a time &#8220;epoch&#8221;. Queries specify an epoch. It&#8217;s an elegant implementation that is very well suited to a data warehouse.</span></p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://andpointsbeyond.com/2007/02/16/more-on-vertica/' addthis:title='More on Vertica ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Related posts:<ol>
<li><a href='http://andpointsbeyond.com/2007/02/16/whats-vertica/' rel='bookmark' title='What’s Vertica?'>What’s Vertica?</a></li>
<li><a href='http://andpointsbeyond.com/2008/12/11/vertica-for-the-cloud/' rel='bookmark' title='Vertica for the Cloud'>Vertica for the Cloud</a></li>
<li><a href='http://andpointsbeyond.com/2007/11/26/how-well-do-netezza-greenplum-vertica-and-others-handle-12-way-joins/' rel='bookmark' title='How well do Netezza, Greenplum, Vertica and others handle 12-way joins?'>How well do Netezza, Greenplum, Vertica and others handle 12-way joins?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://andpointsbeyond.com/2007/02/16/more-on-vertica/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

