Multi-version Concurrency Control

Database servers support transactions: sequences of actions that are either all processed or none at all, i.e. atomic. To allow multiple concurrent transactions access to the same data, most database servers use a two-phase locking protocol. Each transaction locks sections of the data that it reads or updates to prevent others from seeing its uncommitted changes. Only when the transaction is committed or rolled back can the locks be released. This was one of the earliest methods of concurrency control, and is used by most database systems.

Transactions should be isolated from other transactions. The SQL standard's default isolation level is serialisable. This means that a transaction should appear to run alone and it should not see changes made by others while they are running. Database servers that use two-phase locking typically have to reduce their default isolation level to read committed because running a transaction as serialisable would mean they'd need to lock entire tables to ensure the data remained consistent, and such table-locking would block all other users on the server. So transaction isolation is often traded for concurrency. But losing transaction isolation has implications for the integrity of your data. For example, if we start a transaction to read the amounts in a ledger table without isolation, any totals calculated would include amounts updated, inserted or deleted by other users during our reading of the rows, giving an unstable result.

Database research in the early 1980s discovered a better way of allowing concurrent access to data (e.g. David Reed's Implementing atomic actions on decentralised data, ACM Transactions on Computer Systems 1(1):3-23, 1983). Storing multiple versions of rows would allow transactions to see a stable snapshot of the data. It had the advantage of allowing isolated transactions without the drawback of locks. While one transaction was reading a row, another could be updating the row by creating a new version. This solution at the time was thought to be impractical: storage space was expensive, memory was small, and storing multiple copies of the data seemed unthinkable.

Of course, Moore's Law has meant that disk space is now inexpensive and memory sizes have dramatically increased. This, together with improvements in processor power, has meant that today we can easily store multiple versions and gain the benefits of high concurrency and transaction isolation without locking.

Unfortunately the locking protocols of popular database systems, many of which were designed well over a decade ago, form the core of those systems and replacing them seems to have been impossible, despite recent research again finding that storing multiple versions is better than a single version with locks (H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil and P. O'Neil, A Critique of ANSI SQL Isolation Levels, SIGMOD Record Conference, Vol. 24, No. 2, pp. 1-10).

It seems users have come to expect their servers to lock and their transactions not to be isolated (or in some cases not to have transactions at all!). Many users also have had to treat referential integrity, not as a foundation for their valuable information, but as something they need to try and implement themselves with indexes and triggers! It's now time to find out what you've been missing... ThinkSQL.

Multi-version Concurrency Control

© Copyright 2013, ThinkSQL Ltd. All rights reserved.