Space/Time Trade-offs in Hash Coding with Allowable Errors

Space/Time Trade-offs in Hash Coding with Allowable Errors

July 1970 | BURTON H. BLOOM
Bloom presents a study on space/time trade-offs in hash coding with allowable errors. The paper analyzes computational factors in hash coding, including hash area size (space), reject time (time to identify a message as non-member), and allowable error frequency. Two new hash-coding methods are introduced and compared with a conventional method. The new methods aim to reduce hash area size by allowing a small fraction of errors, which is feasible in applications with large data volumes where core resident hash areas are not feasible. The conventional method uses a hash area with cells, each containing a message. Messages are tested by comparing their hash addresses with stored cells. If a match is found, the message is accepted; otherwise, it is rejected. The new methods allow for smaller hash areas by using codes instead of full messages and by using bit addresses for storage and testing. Method 1 uses smaller cells with codes, allowing for a reduced hash area size. Method 2 uses individual bits for storage and testing, allowing for a smaller hash area by using a small number of bits per message. Both methods allow for a small fraction of errors, which can be used to reduce the hash area size without increasing reject time. The paper analyzes the trade-offs between hash area size, reject time, and allowable error frequency for both methods. It shows that allowing errors can significantly reduce the hash area size, making it feasible to store in core memory. The analysis also shows that the space/time trade-offs for the two methods differ, with method 2 being more efficient in certain scenarios. The paper concludes with an example application in automatic hyphenation, where allowing errors can reduce the hash area size and reduce disk accesses. The analysis shows that using method 2 with a small allowable error frequency can significantly reduce the hash area size and the number of disk accesses required. The paper also discusses the computational factors and trade-offs for both methods, showing that method 2 is more efficient in certain scenarios.Bloom presents a study on space/time trade-offs in hash coding with allowable errors. The paper analyzes computational factors in hash coding, including hash area size (space), reject time (time to identify a message as non-member), and allowable error frequency. Two new hash-coding methods are introduced and compared with a conventional method. The new methods aim to reduce hash area size by allowing a small fraction of errors, which is feasible in applications with large data volumes where core resident hash areas are not feasible. The conventional method uses a hash area with cells, each containing a message. Messages are tested by comparing their hash addresses with stored cells. If a match is found, the message is accepted; otherwise, it is rejected. The new methods allow for smaller hash areas by using codes instead of full messages and by using bit addresses for storage and testing. Method 1 uses smaller cells with codes, allowing for a reduced hash area size. Method 2 uses individual bits for storage and testing, allowing for a smaller hash area by using a small number of bits per message. Both methods allow for a small fraction of errors, which can be used to reduce the hash area size without increasing reject time. The paper analyzes the trade-offs between hash area size, reject time, and allowable error frequency for both methods. It shows that allowing errors can significantly reduce the hash area size, making it feasible to store in core memory. The analysis also shows that the space/time trade-offs for the two methods differ, with method 2 being more efficient in certain scenarios. The paper concludes with an example application in automatic hyphenation, where allowing errors can reduce the hash area size and reduce disk accesses. The analysis shows that using method 2 with a small allowable error frequency can significantly reduce the hash area size and the number of disk accesses required. The paper also discusses the computational factors and trade-offs for both methods, showing that method 2 is more efficient in certain scenarios.
Reach us at info@study.space
Understanding Space%2Ftime trade-offs in hash coding with allowable errors