The ideas in this paper will be incorporated into the Vertica database product. And unfortunately it won’t be open source. At least that’s what one company employee commented on Slashdot.
In the same way that RAID design options (e.g. 1, 5 and 10) can accommodate multiple drive failures, the Vertica system will distribute the same slice of the database to several servers. A grid of commodity hardware can act as a high-availability system and Vertica’s shared-nothing architecture enables this feature without complex design or execution.
We call a system that tolerates K failures K-safe. C-Store will be configurable to support a range of values of K.
Inserts and updates are performed on a separate data store and merged in batches. Deletes are marked with bitmasks. Rather than building a complex locking scheme for grid members, data in the read-only and write stores is stamped with a time “epoch”. Queries specify an epoch. It’s an elegant implementation that is very well suited to a data warehouse.
2 responses so far ↓
1 philbowermaster // May 1, 2007 at 9:17 pm
The technology described — a grid-enabled, column-oriented relational database — is indeed an elegant approach to data warehousing and can provide a huge performance boost for data analytics. My company, Sybase, makes a product with that kind of capability, the Sybase IQ analytics server. It’s already available, with nearly 1,000 customers experiencing tremendous performance acceleration and significant return on investment.
As a whole, analytics servers -– both emerging products like SAND and enterprise-class products like Sybase IQ (which includes advanced features like encryption) — are experiencing very high growth. Vertica could very well ride this current wave of success, so it may not matter that the technology described isn’t really new.
Phil Bowermaster
Sybase
2 JayJ // May 2, 2007 at 10:09 pm
Indeed. Vertica gets the geek chatter when SybaseIQ can actually do it all now–there is a limited-function Vertica download at this time. Business intelligence has swung hard past its tipping point and open-source options are emerging. For example, Pentaho was driving the warehousing and BI section of the MySQL conference. I’ll bet Vertica will become “the MySQL to Sybase’s Oracle” with a hybrid sales model and community-support options. Maybe Tableau will follow suit. That would be a killer combo.
Leave a Comment