Transactions
You need a transaction when you want to do a series of reads/writes yet you have to perform them as a single unit. Problems like transferring money from an account to another and reading a value to multiply it by a number and write the result require a series of reads and writes; therefore they require transactions.
Spanner is a transactional database with quite a few interesting capabilities:
- It can replicate data in a globally distributed way automatically for availability and locality.
- It can automatically shard data among multiple machines and even data centers.
- It versions data and you can go back in time to read a value at a particular timestamp.
- It provides external consistency of read/write transactions. This allows two transactions with read/writes to the same value to run in parallel and succeed if operations are serializable.
- It provides globally consistent reads at a particular timestamp, so you can read in the past.
Spanner provides some of these capabilities by the way it satisfies the ACID properties and by its clock source, TrueTime.
Transaction Types
Spanner provides two transaction types:
- Read-only transactions: Allows user to read in a transaction from a particular timestamp. These transactions are lock-free and you should use them if you only need to read.
- Read/write transactions: Allows users to read and buffer writes in a transactions. Spanner doesn’t allow you to see the result of the writes until you commit the transaction.
Note how the Spanner clients allow you to buffer mutations:
import "cloud.google.com/spanner"
client.ReadWriteTransaction(ctx, func(ctx context.Context, t *spanner.ReadWriteTransaction) error {
var likes int64
// Read likes column...
m := spanner.Update(
"tweets",
[]string{"id", "likes"},
[]interface{}{id, likes})
return t.BufferWrite([]*spanner.Mutation{m})
})
ACID
ACID stands for atomicity, consistency, isolation and durability. ACID provides a minimal promise for application developers about what their database satisfies so they can focus on their problems instead of solving database problems. Without ACID or similar contracts, application developers wouldn’t have a guidance on what’s their responsibility versus what their database provides. Spanner satisfies ACID, and actually provides a stronger contract.
Atomicity means all the operations in a transactions should either fail or succeed. Spanner provides this by failing a transaction if any of the writes in that transaction fails. Spanner doesn’t allow you to read the results of a write in a transaction, instead commits the write when committing the transaction. If a single write fails, it fails the transaction.
Consistency means data should be consistent before and after a transaction. Spanner satisfies this. With strong reads, Spanner allows users to see the same data globally.
Isolation is provided by certain properties and behavior on the read-only and read/write transactions. Spanner’s isolation levels are slightly different from tradition databases. See the Isolation Levels for details.
- In read/write transactions, all reads return from a particular timestamp. If anything is mutated that was read in the transactions, transaction fails. All writes need to be committed for transactions to commit. And all writes in the transaction are only visible once the transaction commits.
- In read-only transactions, all reads return from a particular timestamp. These transactions are lock-free and should be used for performance gain if no writes are needed.
Transactions can read from the current state or sometime in the past. An example of a read-only transaction that reads from a minute ago:
client.ReadOnlyTransaction().
WithTimestampBound(spanner.ExactStaleness(1 * time.Minute)).
Query(...)
Durability means committed transactions shouldn’t be lost and Spanner satisfies this.
Isolation Levels
The SQL standard defines four isolation levels, from lowest-level of isolation to highest, they are read uncommitted, read committed, repeated read and serializable. Spanner provides an isolation level stricter than serializable by default and users are given an option to read stale data with a timestamp-bound for performance.
Spanner is a first in external consistency. Spanner can run multiple transactions in parallel (in different data centers) and still can check if they are serializable. If you start transaction A and read and write v1, B can read and write v1 if it reads v1 after A is committed. Spanner checks if everything is in the chronological order and both transactions can succeed. Spanner can do this even if the transaction are running in different data centers. This is a truly unique feature that allows Spanner to identify as a globally distributed database.
Timestamp-bound reads can also be considered as an isolation-level. Spanner allows this feature for geo-distributed cases for latency benefits. If you can handle a bit of stale data when reading, you can read stale data.
In the following example, user is reading with a maximum staleness of 30 seconds:
client.ReadOnlyTransaction().
WithTimestampBound(spanner.MaxStaleness(30 * time.Second)).
Query(...)
Strong & Stale Reads
Read-only transactions and single calls provides two types of reads:
- Strong reads reads and queries at a timestamp where all previously committed transactions are visible. Two consecutive strong read-only transactions might return different results.
client.Single().Query(...)
// Or explicitly set the strong-read bound
// in a read-only transaction:
client.ReadOnlyTransaction().
WithTimestampBound(spanner.StrongRead()).
Query(...)
- Stale reads can be done for latency gains in geo-distributed cases (if you have a multi-region Spanner). You can read from a particular timestamp or a maximum staleness limit.
Following example reads with maximum staleness of 30 seconds:
client.ReadOnlyTransaction().
WithTimestampBound(spanner.MaxStaleness(30 * time.Second)).
Query(...)
Garbage Collection
Spanner garbage collects old versions of the data in the background. Google Cloud Spanner has a policy to garbage collect stale data if it is older than an hour. This means, you can go back in time an hour and read. The following query is going to fail because the data is already garbage collected.
client.ReadOnlyTransaction().
WithTimestampBound(spanner.ExactStaleness(2 * time.Hour)).
Query(...)
TrueTime
As it’s mentioned above, parts of the reason why external consistency of read/write transactions are possible is because of the clock source Spanner uses. TrueTime is explained in detail in Google’s whitepaper on Spanner, but in a nutshell, it does these two things:
- TrueTime uses two different sources: GPS and atomic clocks. These clocks have different fail modes, hence using both of them is increasing the reliability.
- TrueTime returns the time as an interval. The time could be in fact anywhere between the lower bound and the upper bound. Spanner then adds latency by waiting until it is certain the current time is beyond a particular time. This method adds some latency to the system especially when the uncertainty advertised by GPS masters are high but provides correctness even in a globally distributed situation.
Spanner provides a stricter contract than ACID and a stronger isolation level than serializable databases while being a globally distributed database. It relies on TrueTime, versioning and garbage collection, as well as Google’s networking infrastructure to make it happen.