22 Apr 2002 | Ronald Fagin*, Amnon Lotem†, Moni Naor‡
The paper "Optimal Aggregation Algorithms for Middleware" by Ronald Fagin, Amnon Lotem, and Moni Naor discusses efficient algorithms for determining the top \( k \) objects in a database based on their overall grades, which are derived from \( m \) attributes using a monotone aggregation function. The authors introduce and analyze two algorithms: Fagin's Algorithm (FA) and the Threshold Algorithm (TA).
FA is an efficient algorithm that finds the top \( k \) objects with a middleware cost of \( O(N^{(m-1)/m} k^{1/m}) \) with high probability, where \( N \) is the number of objects in the database. However, FA requires large buffers that can grow unboundedly as the database size increases.
TA is a simpler and more optimal algorithm that requires only a small, constant-size buffer. It is instance optimal, meaning it is optimal for all monotone aggregation functions and over every database, without requiring probabilistic assumptions. TA allows early stopping, providing an approximate version of the top \( k \) answers.
The paper also discusses the concept of instance optimality, where an algorithm is optimal in every instance, not just in the worst case or average case. TA is shown to be instance optimal under various assumptions, including the use of bounded buffers and the absence of wild guesses (random accesses to objects not yet seen under sorted access).
Additionally, the paper explores scenarios where random access is either impossible or expensive, and presents algorithms like NRA (No Random Access) and CA (Combined Algorithm) that are instance optimal in these scenarios. The authors provide lower bounds on the optimality ratio for deterministic and probabilistic algorithms and discuss related work and conclusions.The paper "Optimal Aggregation Algorithms for Middleware" by Ronald Fagin, Amnon Lotem, and Moni Naor discusses efficient algorithms for determining the top \( k \) objects in a database based on their overall grades, which are derived from \( m \) attributes using a monotone aggregation function. The authors introduce and analyze two algorithms: Fagin's Algorithm (FA) and the Threshold Algorithm (TA).
FA is an efficient algorithm that finds the top \( k \) objects with a middleware cost of \( O(N^{(m-1)/m} k^{1/m}) \) with high probability, where \( N \) is the number of objects in the database. However, FA requires large buffers that can grow unboundedly as the database size increases.
TA is a simpler and more optimal algorithm that requires only a small, constant-size buffer. It is instance optimal, meaning it is optimal for all monotone aggregation functions and over every database, without requiring probabilistic assumptions. TA allows early stopping, providing an approximate version of the top \( k \) answers.
The paper also discusses the concept of instance optimality, where an algorithm is optimal in every instance, not just in the worst case or average case. TA is shown to be instance optimal under various assumptions, including the use of bounded buffers and the absence of wild guesses (random accesses to objects not yet seen under sorted access).
Additionally, the paper explores scenarios where random access is either impossible or expensive, and presents algorithms like NRA (No Random Access) and CA (Combined Algorithm) that are instance optimal in these scenarios. The authors provide lower bounds on the optimality ratio for deterministic and probabilistic algorithms and discuss related work and conclusions.