I/O Model of Computation

I/O Model of Computation

2018 | Donghui Zhang and Vassilis J. Tsotras
The I/O model of computation measures the efficiency of an algorithm by counting the number of disk reads and writes it requires. It is widely applicable in database environments, as most data is stored on disks and disk access typically dominates CPU time. In data-intensive applications, such as databases, it is more relevant to measure the number of disk I/Os. This is termed the "I/O model of computation" or disk-based model. Modern hard drives use the seek-rotate-transfer protocol. To transfer data from disk to memory or back, the hard drive must first spend "seek time" to move the read/write head to the cylinder where the data is located, then "rotational delay" until the sector containing the data rotates to a position under the read/write head, and finally transfer time to actually transfer the data. Seek time is typically longer than rotational delay, which is longer than transfer time. Therefore, reading a few bytes of data takes roughly as long as reading thousands of bytes. Data is stored on disks in units called blocks or pages. Every disk I/O corresponds to reading or writing one such page. Random disk I/O costs more than sequential access. A more accurate I/O model should account for the difference between random and sequential I/O. To minimize disk accesses in a database environment, three methods can be used: (i) buffering in main memory pages that have already been accessed, (ii) transferring a number of consecutive pages at once, called a bucket, anticipating next requests due to data locality, and (iii) using structures (indices) that organize data into pages so that searching for a particular record takes few page accesses. The importance of the I/O computation in data structures is exemplified by the difference in I/Os required for searching a record in a balanced binary search tree versus a disk-optimized structure like a B+-tree. The B+-tree is more efficient in terms of page I/Os. Disk resident data structures are called paginated or disk-based or external. Before a new page read is executed, the buffer manager is first examined to see if the requested page is already in memory. If it is, the page is retrieved from the buffer, avoiding the cost of a page I/O. The buffer has limited capacity, and page replacement policies play an important role.The I/O model of computation measures the efficiency of an algorithm by counting the number of disk reads and writes it requires. It is widely applicable in database environments, as most data is stored on disks and disk access typically dominates CPU time. In data-intensive applications, such as databases, it is more relevant to measure the number of disk I/Os. This is termed the "I/O model of computation" or disk-based model. Modern hard drives use the seek-rotate-transfer protocol. To transfer data from disk to memory or back, the hard drive must first spend "seek time" to move the read/write head to the cylinder where the data is located, then "rotational delay" until the sector containing the data rotates to a position under the read/write head, and finally transfer time to actually transfer the data. Seek time is typically longer than rotational delay, which is longer than transfer time. Therefore, reading a few bytes of data takes roughly as long as reading thousands of bytes. Data is stored on disks in units called blocks or pages. Every disk I/O corresponds to reading or writing one such page. Random disk I/O costs more than sequential access. A more accurate I/O model should account for the difference between random and sequential I/O. To minimize disk accesses in a database environment, three methods can be used: (i) buffering in main memory pages that have already been accessed, (ii) transferring a number of consecutive pages at once, called a bucket, anticipating next requests due to data locality, and (iii) using structures (indices) that organize data into pages so that searching for a particular record takes few page accesses. The importance of the I/O computation in data structures is exemplified by the difference in I/Os required for searching a record in a balanced binary search tree versus a disk-optimized structure like a B+-tree. The B+-tree is more efficient in terms of page I/Os. Disk resident data structures are called paginated or disk-based or external. Before a new page read is executed, the buffer manager is first examined to see if the requested page is already in memory. If it is, the page is retrieved from the buffer, avoiding the cost of a page I/O. The buffer has limited capacity, and page replacement policies play an important role.
Reach us at info@study.space
[slides and audio] Information Retrieval