August 2013 | JAMES C. CORBETT, JEFFREY DEAN, MICHAEL EPSTEIN, ANDREW FIKES, CHRISTOPHER FROST, J. J. FURMAN, SANJAY GHEMAWAT, ANDREY GUBAREV, CHRISTOPHER HEISER, PETER HOCHSCHILD, WILSON HSIEH, SEBASTIAN KANTHAK, EUGENE KOGAN, HONGYI LI, ALEXANDER LLOYD, SERGEY MELNIK, DAVID MWaura, DAVID NAGLE, SEAN QUINLAN, RAJESH RAO, LINDSAY ROLIG, YASUSHI SAITO, MICHAL SZYMANIAK, CHRISTOPHER TAYLOR, RUTH WANG, and DALE WOODFORD, Google, Inc.
Spanner is Google's globally distributed, scalable, multiversion, and synchronously replicated database. It is the first system to distribute data at a global scale and support externally consistent distributed transactions. This article describes Spanner's architecture, features, design rationale, and a novel time API that exposes clock uncertainty. The API and its implementation are crucial for supporting external consistency and features such as non-blocking reads in the past, lock-free snapshot transactions, and atomic schema changes across all of Spanner.
Spanner is organized into zones, which are the units of administrative deployment and data replication. Each zone contains one *zonemaster* and between 100 and several thousand *spanservers*. Spanservers manage data through a set of *tablets*, which are similar to Bigtable's tablets but with timestamps assigned to data. Paxos state machines are implemented on each tablet to ensure replication and consistency. Spanner also supports a bucketing abstraction called *directories*, which allow applications to control data locality.
The key enabler of Spanner's features is the TrueTime API, which directly exposes clock uncertainty. The API guarantees that Spanner's timestamps reflect serialization order and satisfy external consistency. This is achieved by using multiple modern clock references (GPS and atomic clocks) to keep uncertainty small.
Spanner supports read-write transactions, snapshot transactions, and snapshot reads. It ensures external consistency by assigning timestamps to transactions and enforcing monotonicity and disjointness invariants. TrueTime enables Spanner to support atomic schema changes, which would be infeasible with a standard transaction due to the large number of participants.
The article includes microbenchmarks showing Spanner's performance with respect to replication, transactions, and availability. It also discusses the behavior of TrueTime and provides a case study of F1, a Google application that uses Spanner.Spanner is Google's globally distributed, scalable, multiversion, and synchronously replicated database. It is the first system to distribute data at a global scale and support externally consistent distributed transactions. This article describes Spanner's architecture, features, design rationale, and a novel time API that exposes clock uncertainty. The API and its implementation are crucial for supporting external consistency and features such as non-blocking reads in the past, lock-free snapshot transactions, and atomic schema changes across all of Spanner.
Spanner is organized into zones, which are the units of administrative deployment and data replication. Each zone contains one *zonemaster* and between 100 and several thousand *spanservers*. Spanservers manage data through a set of *tablets*, which are similar to Bigtable's tablets but with timestamps assigned to data. Paxos state machines are implemented on each tablet to ensure replication and consistency. Spanner also supports a bucketing abstraction called *directories*, which allow applications to control data locality.
The key enabler of Spanner's features is the TrueTime API, which directly exposes clock uncertainty. The API guarantees that Spanner's timestamps reflect serialization order and satisfy external consistency. This is achieved by using multiple modern clock references (GPS and atomic clocks) to keep uncertainty small.
Spanner supports read-write transactions, snapshot transactions, and snapshot reads. It ensures external consistency by assigning timestamps to transactions and enforcing monotonicity and disjointness invariants. TrueTime enables Spanner to support atomic schema changes, which would be infeasible with a standard transaction due to the large number of participants.
The article includes microbenchmarks showing Spanner's performance with respect to replication, transactions, and availability. It also discusses the behavior of TrueTime and provides a case study of F1, a Google application that uses Spanner.