Storing a Sparse Table with O(1) Worst Case Access Time

Storing a Sparse Table with O(1) Worst Case Access Time

July 1984 | MICHAEL L. FREDMAN AND JÁNOS KOMLÓS AND ENDRE SZEMERÉDI
A data structure is described for representing a set of n items from a universe of m items, using space O(n) and allowing membership queries in constant time. The data structure and query algorithm are easy to implement. The paper addresses the problem of storing a static set S of n distinct names from a universe U of m possible names, with the goal of efficient membership queries. Hashing schemes provide O(n) storage and O(1) average query time, but this paper focuses on worst-case query time while maintaining O(n) storage. Yao's model shows that for m growing exponentially in n, storage n+1 and worst-case query time 2 can be achieved. Yao and Tarjan show that O(n) storage and O(log m / log n) worst-case query time can generally be achieved. However, for intermediate ranges of m, such as m=2^n, linear storage and constant query time remain unresolved. The paper presents a storage technique achieving linear storage and constant worst-case query time for all m and n. The query algorithm is easy to implement and the relative magnitudes of m and n play no role in the proofs. Section 3 discusses a general framework motivating the construction. Section 4 describes a refinement achieving space n + o(n) while retaining constant query time. Section 5 describes variations of the method. The basic representation involves partitioning the set S into n blocks using a function f(x) = (kx mod p) mod n. Each block is resolved using a perfect hash function. A membership query for q involves determining the block containing q and checking the appropriate cell. The construction of the representation may take O(mn) time in the worst case, but with a constant factor increase in space, it can be constructed in random expected time O(n). The paper also discusses the use of probabilistic arguments and separating systems to achieve efficient storage and query times. The refinement reduces storage to n + o(n) while maintaining constant query time by partitioning S into more blocks and resolving only those with multiple elements. The auxiliary data structure helps determine the location of nonempty blocks and their associated tags. The paper concludes with variations of the method and references to related work.A data structure is described for representing a set of n items from a universe of m items, using space O(n) and allowing membership queries in constant time. The data structure and query algorithm are easy to implement. The paper addresses the problem of storing a static set S of n distinct names from a universe U of m possible names, with the goal of efficient membership queries. Hashing schemes provide O(n) storage and O(1) average query time, but this paper focuses on worst-case query time while maintaining O(n) storage. Yao's model shows that for m growing exponentially in n, storage n+1 and worst-case query time 2 can be achieved. Yao and Tarjan show that O(n) storage and O(log m / log n) worst-case query time can generally be achieved. However, for intermediate ranges of m, such as m=2^n, linear storage and constant query time remain unresolved. The paper presents a storage technique achieving linear storage and constant worst-case query time for all m and n. The query algorithm is easy to implement and the relative magnitudes of m and n play no role in the proofs. Section 3 discusses a general framework motivating the construction. Section 4 describes a refinement achieving space n + o(n) while retaining constant query time. Section 5 describes variations of the method. The basic representation involves partitioning the set S into n blocks using a function f(x) = (kx mod p) mod n. Each block is resolved using a perfect hash function. A membership query for q involves determining the block containing q and checking the appropriate cell. The construction of the representation may take O(mn) time in the worst case, but with a constant factor increase in space, it can be constructed in random expected time O(n). The paper also discusses the use of probabilistic arguments and separating systems to achieve efficient storage and query times. The refinement reduces storage to n + o(n) while maintaining constant query time by partitioning S into more blocks and resolving only those with multiple elements. The auxiliary data structure helps determine the location of nonempty blocks and their associated tags. The paper concludes with variations of the method and references to related work.
Reach us at info@study.space