August 2013 | JAMES C. CORBETT, JEFFREY DEAN, MICHAEL EPSTEIN, ANDREW FIKEs, CHRISTOPHER FROST, J. J. FURMAN, SANJAY GHEMAWAT, ANDREY GUBAREV, CHRISTOPHER HEISER, PETER HOCHSCHILD, WILSON HSIEH, SEBASTIAN KANTHAK, EUGENE KOGAN, HONGYI LI, ALEXANDER LLOYD, SERGEY MELNIK, DAVID MWAURA, DAVID NAGLE, SEAN QUINLAN, RAJESH RAO, LINDSAY ROLIG, YASUSHI SAITO, MICHAL SZYMANIAK, CHRISTOPHER TAYLOR, RUTH WANG, and DALE WOODFORD
Spanner is Google's globally distributed, scalable, multiversion, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. The paper describes Spanner's structure, features, design rationale, and a novel time API that enables external consistency and powerful features like nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes. Spanner uses a TrueTime API to manage clock uncertainty, ensuring globally meaningful commit timestamps even with distributed transactions. The API exposes clock uncertainty and relies on bounds provided by Google's cluster-management software, which keeps uncertainty small (generally less than 10ms) using GPS and atomic clocks. Spanner's design allows dynamic data replication across datacenters, with applications controlling data locality and replication configurations. It supports externally consistent reads and writes, globally consistent reads at timestamps, and atomic schema updates. Spanner's data model is based on schematized semirelational tables, with versioned data and configurable garbage collection. It supports general-purpose transactions and an SQL-based query language. Spanner's implementation includes a directory abstraction for data movement, a Paxos-based replication system, and a transaction manager for distributed transactions. The TrueTime API enables external consistency and is critical for Spanner's features. The paper evaluates Spanner's performance, including replication, transactions, and availability, and discusses its use in applications like F1. Spanner's design allows it to scale to millions of machines across hundreds of datacenters and trillions of database rows. It provides high availability even in the face of wide-area natural disasters and supports applications with complex schemas and strong consistency requirements. The paper also discusses Spanner's concurrency control, timestamp management, and how it ensures external consistency and transactional correctness. Spanner's evaluation shows that two-phase commit can scale to a reasonable number of participants, and its availability benefits from running in multiple datacenters.Spanner is Google's globally distributed, scalable, multiversion, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. The paper describes Spanner's structure, features, design rationale, and a novel time API that enables external consistency and powerful features like nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes. Spanner uses a TrueTime API to manage clock uncertainty, ensuring globally meaningful commit timestamps even with distributed transactions. The API exposes clock uncertainty and relies on bounds provided by Google's cluster-management software, which keeps uncertainty small (generally less than 10ms) using GPS and atomic clocks. Spanner's design allows dynamic data replication across datacenters, with applications controlling data locality and replication configurations. It supports externally consistent reads and writes, globally consistent reads at timestamps, and atomic schema updates. Spanner's data model is based on schematized semirelational tables, with versioned data and configurable garbage collection. It supports general-purpose transactions and an SQL-based query language. Spanner's implementation includes a directory abstraction for data movement, a Paxos-based replication system, and a transaction manager for distributed transactions. The TrueTime API enables external consistency and is critical for Spanner's features. The paper evaluates Spanner's performance, including replication, transactions, and availability, and discusses its use in applications like F1. Spanner's design allows it to scale to millions of machines across hundreds of datacenters and trillions of database rows. It provides high availability even in the face of wide-area natural disasters and supports applications with complex schemas and strong consistency requirements. The paper also discusses Spanner's concurrency control, timestamp management, and how it ensures external consistency and transactional correctness. Spanner's evaluation shows that two-phase commit can scale to a reasonable number of participants, and its availability benefits from running in multiple datacenters.