|
As an alternative to the traditional relational database that organizes data in tables composed of vertical columns and horizontal rows, database pioneer and Vertica Systems CTO Michael Stonebraker is promoting the Vertica 2.0 column-based database method of organizing data, promising much quicker response times. Rows are central to IBM's DB2, MySQL, Oracle, Microsoft SQL Server, Sybase, and Teradata. Yet Stonebraker maintains that in data warehousing, faster performance is gained through a column layout. Stonebraker says that all types of queries on the majority of data warehouses will run up to 50 times faster in a column database, so he believes the larger the data warehouse, the greater the performance gain. In the 1970s, Stonebraker was one of the original architects of the Ingres regional database, which sparked many commercial variants. While a row-based system like Ingres is great for executing transactions, Stonebraker now believes that a column-based system like Vertica 2.0, which is due for general release in March 2008, is a more natural fit for data warehouses because they frequently store transactional data, with each transaction having many parts. Columns cut across transactions and store an element of information that is standard to each transaction, such as customer name, address, and purchase amount. A second performance benefit to the column approach is that it is possible to derive a compression scheme for the data type and then apply it throughout the column because columns contain similar information from each transaction. Compressing data in columns makes for quicker storage and retrieval and reduces the required disk space. Sonian Networks, which archives e-mail for other businesses, is one of the companies that has begun using Verticas database since it was released in September 2007. Stonebraker now anticipates the data warehouse market to become entirely based on column storage.
|