Parallel Database Systems: The Future of High Performance Database Systems

Parallel Database Systems: The Future of High Performance Database Systems

June 1992 | David DeWitt and Jim Gray
Highly parallel database systems are increasingly replacing traditional mainframe computers for large-scale database and transaction processing tasks. This trend refutes a 1983 paper predicting the demise of database machines. Initially, the future of parallel database systems seemed bleak, but over the past decade, companies like Teradata, Tandem, and many startups have successfully developed and marketed highly parallel machines. The success of these systems is attributed to the widespread adoption of the relational data model, which is ideal for parallel execution. Relational queries consist of uniform operations applied to uniform data streams, allowing them to be composed into highly parallel dataflow graphs. These graphs enable both pipelined and partitioned parallelism. The dataflow approach to database systems requires a message-based client-server operating system and high-speed networks, which are now mainstream. Shared-nothing architectures, where each processor has its own memory and disk, minimize interference and allow scalable systems. These architectures are used by Teradata, Tandem, NCR, Oracle nCUBE, and other systems. They enable linear speedup and scaleup by distributing data and processing across multiple processors. Shared-nothing machines use commodity hardware, including fast processors, disks, and memory, to achieve high performance. They minimize interference by reducing resource sharing and exploit parallelism effectively. These systems are more scalable and cost-effective than shared-memory or shared-disk systems. The ideal parallel system demonstrates linear speedup and scaleup, where more hardware can perform tasks faster or handle larger tasks in the same time. Parallelism in relational databases is achieved through dataflow graphs, which allow operators to be composed in parallel. Techniques like pipelined and partitioned parallelism enable efficient execution. Hash and range partitioning strategies are used to distribute data across disks, improving I/O performance. Split and merge operators help manage data flow and parallel execution. The state-of-the-art includes systems like Teradata, Tandem, Gamma, and Bubba, which use shared-nothing architectures and demonstrate near-linear speedup and scaleup. These systems are designed for high-performance transaction processing and large-scale data queries. They use techniques like hash joins, sort-merge joins, and parallel index updates to optimize performance. The future of parallel database systems lies in continued research and development, with a focus on improving algorithms, reducing skew, and enhancing scalability. Shared-nothing architectures are becoming the standard due to their efficiency, cost-effectiveness, and ability to scale to large numbers of processors and disks.Highly parallel database systems are increasingly replacing traditional mainframe computers for large-scale database and transaction processing tasks. This trend refutes a 1983 paper predicting the demise of database machines. Initially, the future of parallel database systems seemed bleak, but over the past decade, companies like Teradata, Tandem, and many startups have successfully developed and marketed highly parallel machines. The success of these systems is attributed to the widespread adoption of the relational data model, which is ideal for parallel execution. Relational queries consist of uniform operations applied to uniform data streams, allowing them to be composed into highly parallel dataflow graphs. These graphs enable both pipelined and partitioned parallelism. The dataflow approach to database systems requires a message-based client-server operating system and high-speed networks, which are now mainstream. Shared-nothing architectures, where each processor has its own memory and disk, minimize interference and allow scalable systems. These architectures are used by Teradata, Tandem, NCR, Oracle nCUBE, and other systems. They enable linear speedup and scaleup by distributing data and processing across multiple processors. Shared-nothing machines use commodity hardware, including fast processors, disks, and memory, to achieve high performance. They minimize interference by reducing resource sharing and exploit parallelism effectively. These systems are more scalable and cost-effective than shared-memory or shared-disk systems. The ideal parallel system demonstrates linear speedup and scaleup, where more hardware can perform tasks faster or handle larger tasks in the same time. Parallelism in relational databases is achieved through dataflow graphs, which allow operators to be composed in parallel. Techniques like pipelined and partitioned parallelism enable efficient execution. Hash and range partitioning strategies are used to distribute data across disks, improving I/O performance. Split and merge operators help manage data flow and parallel execution. The state-of-the-art includes systems like Teradata, Tandem, Gamma, and Bubba, which use shared-nothing architectures and demonstrate near-linear speedup and scaleup. These systems are designed for high-performance transaction processing and large-scale data queries. They use techniques like hash joins, sort-merge joins, and parallel index updates to optimize performance. The future of parallel database systems lies in continued research and development, with a focus on improving algorithms, reducing skew, and enhancing scalability. Shared-nothing architectures are becoming the standard due to their efficiency, cost-effectiveness, and ability to scale to large numbers of processors and disks.
Reach us at info@study.space