Eddies: Continuously Adaptive Query Processing

Eddies: Continuously Adaptive Query Processing

2000 | Ron Avnur, Joseph M. Hellerstein
Eddies: Continuously Adaptive Query Processing Ron Avnur and Joseph M. Hellerstein, University of California, Berkeley Abstract: In large federated and shared-nothing databases, resources can exhibit widely fluctuating characteristics. Traditional static query optimization and execution techniques are ineffective in these environments. This paper introduces a query processing mechanism called an eddy, which continuously reorders operators in a query plan as it runs. Eddies merge the optimization and execution phases of query processing, allowing each tuple to have a flexible ordering of the query operators. This flexibility is controlled by a combination of fluid dynamics and a simple learning algorithm. Initial implementation demonstrates promising results, with eddies performing nearly as well as a static optimizer/executor in static scenarios, and providing dramatic improvements in dynamic execution environments. Introduction: There is increasing interest in query engines that run at unprecedented scale, both for widely-distributed information resources and for massively parallel database systems. We are building a system called Telegraph, which is intended to run queries over all the data available online. A key requirement of a large-scale system like Telegraph is that it function robustly in an unpredictable and constantly fluctuating environment. This unpredictability is endemic in large-scale systems, due to increased complexity in various dimensions. We study a standard single-node object-relational query processing system, with the added capability of opening scans and indexes from external data sets. This is becoming a very common base architecture, available in many of the commercial object-relational systems and in federated database systems. We will refer to these non-resident tables as external tables. We make no assumptions limiting the scale of external sources, which may be arbitrarily large. External tables present many of the dynamic challenges described above: they can reside over a wide-area network, face bursty utilization, and offer very minimal information on costs and statistical properties. We discuss the properties of query processing algorithms that allow (or disallow) them to be frequently reordered. We then present the eddy architecture, and describe how it allows for extreme flexibility in operator ordering. We discuss policies for controlling tuple flow in an eddy. A variety of experiments illustrate the robustness of eddies in both static and dynamic environments, and raise some questions for future work. We survey related work, and in Section 6 lay out a research program to carry this work forward. Reorderability of Plans: A basic challenge of run-time reoptimization is to reorder pipelined query processing operators while they are in flight. To change a query plan on the fly, a great deal of state in the various operators has to be considered, and arbitrary changes can require significant processing and code complexity to guarantee correct results. By constraining the scenarios in which we reorder operators, we can keep this work to a minimum. Before describing eddies, we study the state management of various join algorithms; this discussion motivates the eddy design, and forms the basis of our approach for reoptimEddies: Continuously Adaptive Query Processing Ron Avnur and Joseph M. Hellerstein, University of California, Berkeley Abstract: In large federated and shared-nothing databases, resources can exhibit widely fluctuating characteristics. Traditional static query optimization and execution techniques are ineffective in these environments. This paper introduces a query processing mechanism called an eddy, which continuously reorders operators in a query plan as it runs. Eddies merge the optimization and execution phases of query processing, allowing each tuple to have a flexible ordering of the query operators. This flexibility is controlled by a combination of fluid dynamics and a simple learning algorithm. Initial implementation demonstrates promising results, with eddies performing nearly as well as a static optimizer/executor in static scenarios, and providing dramatic improvements in dynamic execution environments. Introduction: There is increasing interest in query engines that run at unprecedented scale, both for widely-distributed information resources and for massively parallel database systems. We are building a system called Telegraph, which is intended to run queries over all the data available online. A key requirement of a large-scale system like Telegraph is that it function robustly in an unpredictable and constantly fluctuating environment. This unpredictability is endemic in large-scale systems, due to increased complexity in various dimensions. We study a standard single-node object-relational query processing system, with the added capability of opening scans and indexes from external data sets. This is becoming a very common base architecture, available in many of the commercial object-relational systems and in federated database systems. We will refer to these non-resident tables as external tables. We make no assumptions limiting the scale of external sources, which may be arbitrarily large. External tables present many of the dynamic challenges described above: they can reside over a wide-area network, face bursty utilization, and offer very minimal information on costs and statistical properties. We discuss the properties of query processing algorithms that allow (or disallow) them to be frequently reordered. We then present the eddy architecture, and describe how it allows for extreme flexibility in operator ordering. We discuss policies for controlling tuple flow in an eddy. A variety of experiments illustrate the robustness of eddies in both static and dynamic environments, and raise some questions for future work. We survey related work, and in Section 6 lay out a research program to carry this work forward. Reorderability of Plans: A basic challenge of run-time reoptimization is to reorder pipelined query processing operators while they are in flight. To change a query plan on the fly, a great deal of state in the various operators has to be considered, and arbitrary changes can require significant processing and code complexity to guarantee correct results. By constraining the scenarios in which we reorder operators, we can keep this work to a minimum. Before describing eddies, we study the state management of various join algorithms; this discussion motivates the eddy design, and forms the basis of our approach for reoptim
Reach us at info@study.space