May 1997 | Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh
This paper introduces the data cube, a relational aggregation operator that generalizes group-by, cross-tab, and sub-totals. The data cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. The data cube operator is a relation, allowing it to be embedded in complex non-procedural data analysis programs. The paper explains the cube and roll-up operators, shows how they fit in SQL, explains how users can define new aggregate functions for cubes, and discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
The paper discusses the limitations of the GROUP BY operator in SQL, particularly in handling roll-ups and cross-tabulations. The data cube and roll-up operators are introduced as solutions to these limitations. The data cube operator generalizes the GROUP BY operator, allowing for N-dimensional aggregation. The roll-up operator produces super-aggregates, which are used for drill-down reports. The paper also discusses the use of the ALL value to represent sets in the data cube, and the challenges of implementing this value. The paper proposes a minimalist approach to avoid the ALL value by using NULL instead. It also discusses the use of decorations, which are columns that do not appear in the GROUP BY but are functionally dependent on the grouping columns. The paper also discusses the use of dimensions, star, and snowflake queries in the context of data cubes. The paper concludes with a discussion of how to compute data cubes and roll-ups, including the use of distributive, algebraic, and holistic aggregate functions. The paper emphasizes the importance of efficient computation techniques for large data sets.This paper introduces the data cube, a relational aggregation operator that generalizes group-by, cross-tab, and sub-totals. The data cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. The data cube operator is a relation, allowing it to be embedded in complex non-procedural data analysis programs. The paper explains the cube and roll-up operators, shows how they fit in SQL, explains how users can define new aggregate functions for cubes, and discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
The paper discusses the limitations of the GROUP BY operator in SQL, particularly in handling roll-ups and cross-tabulations. The data cube and roll-up operators are introduced as solutions to these limitations. The data cube operator generalizes the GROUP BY operator, allowing for N-dimensional aggregation. The roll-up operator produces super-aggregates, which are used for drill-down reports. The paper also discusses the use of the ALL value to represent sets in the data cube, and the challenges of implementing this value. The paper proposes a minimalist approach to avoid the ALL value by using NULL instead. It also discusses the use of decorations, which are columns that do not appear in the GROUP BY but are functionally dependent on the grouping columns. The paper also discusses the use of dimensions, star, and snowflake queries in the context of data cubes. The paper concludes with a discussion of how to compute data cubes and roll-ups, including the use of distributive, algebraic, and holistic aggregate functions. The paper emphasizes the importance of efficient computation techniques for large data sets.