Whereas most database systems employ locks for concurrency control, PostgreSQL does things a little differently: it maintains data consistency by using a multi-version model, otherwise known as Multi-Version Concurrency Control, or MVCC for short. As a result, when querying a database, each transaction sees a snapshot of data as it was some time before, regardless of the current state of the underlying data. This prevents the transaction from viewing inconsistent data that could be caused by other concurrent transaction updates on the same data, and provides transaction isolation for each database session. This blog article will provide a brief overview of how the MVCC protocol works as well as cover some of the pros and cons of the MVCC approach.
The MVCC Protocol Explained
The main difference between lock models and MVCC is that the latter ensures that reading never blocks writing and writing never blocks reading.
In MVCC, every transaction has a transaction-timestamp that indicates (a) when it started and (b) when a transaction updates a certain data-item, such as a field, or a record, or a table. A new version of this data-item is created while also retaining the older version. Each version is provided with:
- a write-timestamp to indicate the timestamp of the transaction (i.e. the time the transaction started) that created it and
- a read-timestamp to indicate the latest timestamp of all the transactions that have read it.
The basic idea of the MVCC protocol is that the transaction manager only allows operations if they can be allowed in a manner that is consistent with all transactions executing in their entirety at the moment of their timestamp. This is referred to as the presumed execution order. Database researcher Jan Hidders explains how the transaction manager accomplishes this as follows:
If a transaction wants to read an item, it is given access to the version that it would have read in the presumed execution order. This will be the one with the latest write-timestamp just before its own timestamp. For example, if there are versions with write-timestamps 5, 12 and 20, and the timestamp of the transaction is 14, then the version with write-timestamp 12 is the one read by this transaction in the presumed execution order.
If a transaction wants to write an item, it is checked if there is not a read operation that was allowed earlier and that in the presumed execution order would read the new version caused by the requested write operation, but when it was allowed read another version. For example, assume again we have versions with write-timestamps 5, 10 and 16. Moreover assume the read-timestamps of these versions are 8, 12 and 20, respectively. If a transaction with timestamp 11 wants to update the item, there is a problem, because the version with write-timestamp 10 was read by a transaction with timestamp 12. So, if a version with timestamp 11 is created, the transaction with timestamp 12 would in the presumed execution order not have seen the version created by the transaction with timestamp 10, but the one that is now about to be created with timestamp 11. If, on other hand, a transaction with timestamp 14 wants to write the item, this is fine, since as far as we know after t=12 in the presumed execution order the item was not read by any transaction until the moment it was updated at t=16.
Pros and Cons of MVCC
Pros:
- As you can tell from the description above, all read operations will always be allowed immediately. This is usually not the case in a lock-based approach, where read-locks might be refused because of existing write-locks.
- It tends to allow also more write operations to go through immediately than lock-based approaches usually do.
Cons
- If a write operation is refused, there is no alternative but rolling back or restarting the transaction: once the update is refused it will also be refused if we retry it later. This differs from lock-based approaches, where we usually can wait until the lock becomes available. It is for this reason that MVCC is categorised as an optimistic protocol: it is very efficient if there are no conflicts, but once there is one you may have to undo a lot of work.
- The many versions of an item might require significantly more storage space. In lock-based approaches only the one version needs to be stored.
- The removal of versions that are no longer needed can cause some overhead.
Final Thoughts on Multi-Version Concurrency Control in PostgreSQL
This blog article provided an overview of how the MVCC protocol works and presented a few of its pros and cons.
Interested in working with PostgreSQL? You can try Navicat 16 for PostgreSQL for FREE for 14 days!