The Count-Min Sketch (CM sketch) is a sublinear space data structure introduced for summarizing data streams. It allows efficient approximate answers to fundamental queries such as point, range, and inner product queries, and can be used to solve important data stream problems like finding quantiles and frequent items. The CM sketch significantly improves previous results by reducing space and time complexity from $1/\varepsilon^2$ to $1/\varepsilon$ in terms of the approximation parameter $\varepsilon$.
The CM sketch is a two-dimensional array with width $w$ and depth $d$, where $w = \lceil e/\varepsilon \rceil$ and $d = \lceil \ln(1/\delta) \rceil$. It uses pairwise-independent hash functions and allows updates and queries in $O(\log(1/\delta))$ time. The sketch can answer point queries by taking the minimum value across rows, range queries by decomposing the range into dyadic intervals and summing point estimates, and inner product queries by computing the minimum of row-wise products.
The CM sketch provides probabilistic guarantees for query accuracy, with the error bounded by $\varepsilon ||a||_1$ for point and range queries, and $\varepsilon ||a||_1 ||b||_1$ for inner product queries. It is particularly effective for the turnstile model, where both insertions and deletions are allowed, and has been applied to solve problems such as quantile estimation and heavy hitters detection.
The CM sketch improves upon previous methods by reducing space complexity and offering better performance in both time and space. It is simple to implement and has been shown to be effective in various data stream applications, including network traffic analysis and database query optimization. However, it is not suitable for computing norms of data streams, which have applications in correlation analysis and tracking distinct elements. The paper concludes that the CM sketch is a practical and efficient solution for many data stream problems, with potential applications in hardware and other domains.The Count-Min Sketch (CM sketch) is a sublinear space data structure introduced for summarizing data streams. It allows efficient approximate answers to fundamental queries such as point, range, and inner product queries, and can be used to solve important data stream problems like finding quantiles and frequent items. The CM sketch significantly improves previous results by reducing space and time complexity from $1/\varepsilon^2$ to $1/\varepsilon$ in terms of the approximation parameter $\varepsilon$.
The CM sketch is a two-dimensional array with width $w$ and depth $d$, where $w = \lceil e/\varepsilon \rceil$ and $d = \lceil \ln(1/\delta) \rceil$. It uses pairwise-independent hash functions and allows updates and queries in $O(\log(1/\delta))$ time. The sketch can answer point queries by taking the minimum value across rows, range queries by decomposing the range into dyadic intervals and summing point estimates, and inner product queries by computing the minimum of row-wise products.
The CM sketch provides probabilistic guarantees for query accuracy, with the error bounded by $\varepsilon ||a||_1$ for point and range queries, and $\varepsilon ||a||_1 ||b||_1$ for inner product queries. It is particularly effective for the turnstile model, where both insertions and deletions are allowed, and has been applied to solve problems such as quantile estimation and heavy hitters detection.
The CM sketch improves upon previous methods by reducing space complexity and offering better performance in both time and space. It is simple to implement and has been shown to be effective in various data stream applications, including network traffic analysis and database query optimization. However, it is not suitable for computing norms of data streams, which have applications in correlation analysis and tracking distinct elements. The paper concludes that the CM sketch is a practical and efficient solution for many data stream problems, with potential applications in hardware and other domains.