Scale and Performance in a Distributed File System

Scale and Performance in a Distributed File System

1987 | John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, Michael J. West
The paper discusses the design and performance of the Andrew distributed file system, a key component of the Andrew distributed computing environment developed by Carnegie Mellon University and IBM. The system aims to support up to 7000 workstations, providing a shared file system with application programs and system administration features. A fundamental design decision is to transfer whole files between servers and workstations, rather than smaller units like records or blocks, to enhance scalability. The prototype of the Andrew file system, implemented with six servers and 100 workstations, demonstrated that most applications could use files in the system without recompilation or relinking. However, performance was acceptable up to about 20 active users per server, with some limitations at higher loads. Performance improvements were achieved through several mechanisms: 1. **Callback Mechanism**: File servers notify the caching process (Venus) when file status changes, reducing unnecessary re validations. 2. **Low-Level Internal File Names (Fids)**: Fids are unique identifiers for files, allowing direct pathname resolution on workstations without server involvement. 3. **Lightweight Processes**: Introduced to handle process switching and reduce CPU load. The Andrew file system was compared with Sun Microsystems' NFS, showing better performance at higher loads. At peak loads, NFS took nearly twice as long as Andrew, and both systems had high CPU and disk utilization, but Andrew's load sensitivity was significantly lower. Operability features included file migration, space quotas, replication, and on-the-fly backup. The volume mechanism, which allows for easy management of files and administrative actions, is highlighted as essential for large distributed file systems. The paper concludes that the current design can nearly double the number of workstations and suggests future improvements, such as moving Venus and server code into the kernel, improving network topology, and enhancing monitoring and accounting tools.The paper discusses the design and performance of the Andrew distributed file system, a key component of the Andrew distributed computing environment developed by Carnegie Mellon University and IBM. The system aims to support up to 7000 workstations, providing a shared file system with application programs and system administration features. A fundamental design decision is to transfer whole files between servers and workstations, rather than smaller units like records or blocks, to enhance scalability. The prototype of the Andrew file system, implemented with six servers and 100 workstations, demonstrated that most applications could use files in the system without recompilation or relinking. However, performance was acceptable up to about 20 active users per server, with some limitations at higher loads. Performance improvements were achieved through several mechanisms: 1. **Callback Mechanism**: File servers notify the caching process (Venus) when file status changes, reducing unnecessary re validations. 2. **Low-Level Internal File Names (Fids)**: Fids are unique identifiers for files, allowing direct pathname resolution on workstations without server involvement. 3. **Lightweight Processes**: Introduced to handle process switching and reduce CPU load. The Andrew file system was compared with Sun Microsystems' NFS, showing better performance at higher loads. At peak loads, NFS took nearly twice as long as Andrew, and both systems had high CPU and disk utilization, but Andrew's load sensitivity was significantly lower. Operability features included file migration, space quotas, replication, and on-the-fly backup. The volume mechanism, which allows for easy management of files and administrative actions, is highlighted as essential for large distributed file systems. The paper concludes that the current design can nearly double the number of workstations and suggests future improvements, such as moving Venus and server code into the kernel, improving network topology, and enhancing monitoring and accounting tools.
Reach us at info@study.space