Tag Archives: Vertica

Vertica for the Cloud

While I have my head in the clouds, I should mention that Vertica has a cloud solution that they manage for you. Not new, but gives some perspective.

With competitive offerings in the $10-20k per terabyte, this is an attractive offer and a great way to try before you invest when you have that much data.

I hear Vertica is a screamer, but I can’t imagine getting sub-second results for 3 TB of data on 3 virtualized servers, for the same reasons I gave in my previous post.

Vertica for the Cloud Pricing

Low-Cost Data Analysis & Visualization: It’s Getting Better All The Time

Over the weekend I have revisited Tableau, enjoyed some success with MonetDB, tried to turn MySQL into a hundred million row data warehouse, been underwhelmed with Firebird, installed Greenplum and spent many frustrated hours with Talend Open Studio, Pentaho Kettle and Jitterbit.

Of course, I could just buy QlikView, but what can be done for less $money? Unfortunately data warehouses and BI front-ends are not sexy problems in the opensource community. Graphs and charts get a little more attention, but you’ll need to write your own code to glue them to your application.

In summary, what can I say about our options?

First, write your own ETL. Why do opensource ETL tools like Talend and Kettle work so hard to rebuild Informatica? It reminds me of Linux in the 1990s when the community wanted to beat Windows and kept working to look like Windows and wondering when victory would arrive. Informatica, like OLAP and mainframes, is from an era when memory was scarce; languages were low-level, slow to compile & run, abstracted little and were not at all portable. On top of that, ODBC drivers were tightly controlled and costly.

But now we can pick from many great scripting languages. Today’s languages abstract the hard parts, are easy to read, can be edited while executing and talk to any system, database, web service or application. I think the next direction for ETL will be a simple (but extensible) transformation language using an ORM wrapper… Rails on ETL. Until that arrives, you can achieve everything you need with PHP, Perl, Ruby and others.

Best option for low-cost data warehouse?

Continue reading

More on Vertica

The ideas in this paper will be incorporated into the Vertica database product. And unfortunately it won’t be open source. At least that’s what one company employee commented on Slashdot.

In the same way that RAID design options (e.g. 1, 5 and 10) can accommodate multiple drive failures, the Vertica system will distribute the same slice of the database to several servers. A grid of commodity hardware can act as a high-availability system and Vertica’s shared-nothing architecture enables this feature without complex design or execution.

We call a system that tolerates K failures K-safe. C-Store will be configurable to support a range of values of K.

Inserts and updates are performed on a separate data store and merged in batches. Deletes are marked with bitmasks. Rather than building a complex locking scheme for grid members, data in the read-only and write stores is stamped with a time “epoch”. Queries specify an epoch. It’s an elegant implementation that is very well suited to a data warehouse.

What’s Vertica?

Started by a major contributor to the Ingres and Postgres projects, Vertica is implementing a read-optimized database that is an excellent fit for the data warehouse world. Given the founder’s support of open-source, I expect this company will follow the hybrid commercial/FOSS model of MySQL and others. Some core design features include highly compact storage, total ad-hoc read optimization, and using a shared-nothing grid design that is dead easy to implement with commodity (not High-Availability) hardware. Via Slashdot.

New database company raises funds, nabs ex-Oracle bigwigs – Network World