PSQL Concurrent Transactions
Dealing with concurrent transactions posse different problems. The SQL standard describes three different read concurrency problems:
- Dirty read. A transaction reads data updated by a concurrent uncommited transaction.
- Non-repeatable read. The same row read at two different times inside the same transaction shows different data. The data difference is due to a concurrents transaction which commited in the time between the two different reads.
- Phantom read. Running the same select statement two times in the same transaction returns a different set of rows. This is due to new inserts or deletes happening on a concurrent commited transaction.
Isolation Levels
To deal with the problems arised from the concurrent transaction, sql engimes provide different transaction isolation levels.
Isolation leves control what data can be shared between concurrent transactions. The SQL standard defines four different isolation levels (from lower to higher isolation):
- Read uncommited
- Read commited
- Repeatable reads
- Serializable
The higher the isolation level is, the less concurrency problems which will be found. However, a higher isolation level requires more system resources and might result in transactions been blocked by concurrent transactions.
Dirty read | Non-repeatable read | Phantom read | |
---|---|---|---|
Read uncommitted | yes | yes | yes |
Read committed | no | yes | yes |
Repeatable read | no | no | yes |
Serializable | no | no | no |
Due to psql multiversion concurrency control model, the lower possible isolation level by design is read commited, which is also the default isolation level.
Read Commited
A select query sees the database snapshot which existed when the query starts. Selects inside a transaction will see the effects of updates or inserts inside the same transaction (even not commited) and also will see the effects of concurrent commited transactions.
Repeatable Reads
Transactions with repeatable read isolation level set a snapshot id, at the transaction level, when the first non-transactional-control statement runs. After the snapshot id is fixed in the transaction, any changes by concurrent queries/transactions will have not effect.
The difference with read commited is that a query in a repeatable read transaction will see always the same snapshop. While queries in a read commited transactions see the snapshop as of the start of the query.
Warning: transactions using this isolation level can raise serialization errors. Therefore, applications relying on this isolation level should be prepare to handle such errors, for example with retries.
This isolation level is implented by the snapshot isolation technique. This isolation level implementation differs between different database systems, therefore the performance and behavior differs. Take this into account when choosing a database system for an application.
Serializable
Transactions can commit only if it can be proved that there is a serial order of execution that would produce the same results.
This level is built on top of the repeatable reads implementation. Therefore, it also uses the snapshot isolation technique. On top of that, it uses locks of type SIReadLock to identify when a write affects the results of a concurrent transaction read. When the transaction commits the existing SIReadLocks are checked and if collisions are detected, the concurrent transactions fail with a serialization error.
SIReadLocks are also known as predicate locks and they do not block any data access and can not cause deadlocks. These locks might live longer than the transactions which created them. They are released by ClearOldPredicateLocks.
The serializable level might cause false positives, cancelling some transactions which are actually serializable. As with the repeatable read isolation level, applications need to prepare to handle errors.
Tests
Lets take the following schema:
CREATE TABLE users (
id serial PRIMARY KEY,
username varchar(255),
experience int
);
And populate it with:
INSERT INTO users (username)
VALUES
('irritable_peregrin'),
('shocked_smaug'),
('devastated_saruman'),
('bored_osgiliath'),
('troubled_bagginses');
UPDATE users SET experience = id * 10;
Read Commited
Check data changes within transaction
Lets see that a transaction can see the changes done by itself:
BEGIN;
SELECT * FROM users where id = 1;
id | username | experience
----+--------------------+------------
1 | irritable_peregrin | 10
UPDATE users SET experience = experience - 10 WHERE id = 1;
UPDATE 1
SELECT * FROM users WHERE id = 1;
id | username | experience
----+--------------------+------------
1 | irritable_peregrin | 0
(1 row)
ROLLBACK;
Check data changes by concurrent transactions
Now lets see how a transaction can see changes done by concurrent transactions. For these test two transactions are started on two different terminals, all the commands from both transactions are listed on a chronological order:
-- 1. Check the experience of user 1 on transaction 1
transaction-1=*# SELECT * FROM users WHERE id = 1;
id | username | experience
----+--------------------+------------
1 | irritable_peregrin | 10
(1 row)
-- 2. Update the experience on a concurrent transaction
transaction-2=*# UPDATE users SET experience = experience - 50 WHERE id = 1;
UPDATE 1
-- 3. Check the experience again on transaction 1
transaction-1=*# SELECT * FROM users WHERE id = 1;
id | username | experience
----+--------------------+---------
1 | irritable_peregrin | 50
(1 row)
-- 4. Commit transaction 2
transaction-2=*# COMMIT;
-- 5. Check experience again
transaction-1=*# SELECT * FROM users WHERE id = 1;
id | username | experience
----+--------------------+------------
1 | irritable_peregrin | -40
(1 row)
It can be seen that after the concurrent transaction commits, the read returns the updated value.
Repeateable Reads
Data does not change
According to the theory all selects inside a single transaction should see the same data, even though there is a commited concurrent update:
-- 1 Start transaction
transaction-1=# BEGIN ISOLATION LEVEL REPEATABLE READ;
-- 2 Check data status, since this is done within the transaction the snapshot id is fixed
transaction-1=*# SELECT * FROM users WHERE id = 5;
id | username | experience
----+--------------------+------------
5 | troubled_bagginses | 50
(1 row)
-- 3 Using another transaction update a value
db=# UPDATE users SET experience = 0 WHERE id = 5;
UPDATE 1
-- 4 Check from within the transaction that there are no changes
transaction-1=*# SELECT * FROM users WHERE id = 5;
id | username | experience
----+--------------------+------------
5 | troubled_bagginses | 50
(1 row)
-- 5 Check data afer commit from a different transaction
db=# SELECT * FROM users WHERE id = 5;
id | username | experience
----+--------------------+------------
5 | troubled_bagginses | 0
(1 row)
Snapshot id is set after first transaction
Also, according to the theory, data changes before the first select inside the transaction should be reflected since the snapshot id is not yet set:
-- 1 Check data state before starting any transaction
db=# SELECT * FROM users WHERE id = 5;
id | username | experience
----+--------------------+------------
5 | troubled_bagginses | 0
(1 row)
-- 1 Start transaction, do not set snapshot id
transaction-1=# BEGIN ISOLATION LEVEL REPEATABLE READ;
-- 2 Update data from a different transaction
db=# UPDATE users SET experience = 555 WHERE id = 5;
UPDATE 1
-- 3 Read data from inside the transaction, also sets transaction snapshot id
transaction-1=*# SELECT * FROM users WHERE id = 5;
id | username | experience
----+--------------------+------------
5 | troubled_bagginses | 555
(1 row)
-- 4 Update data from a different transaction
db=# UPDATE users SET experience = 0 WHERE id = 5;
UPDATE 1
-- 5 Check again from within the transaction
transaction-1=*# SELECT * FROM users WHERE id = 5;
id | username | experience
----+--------------------+------------
5 | troubled_bagginses | 555
(1 row)
Reproduce a serialize error
Now, lets try to reproduce a serialization error. This should occur when the active transaction updates data which has been modified by a concurrent transaction:
-- 1 Start transaction
transaction-1=# BEGIN ISOLATION LEVEL REPEATABLE READ;
-- 2 Set transaction snapshot id
transaction-1=*# SELECT * FROM users WHERE id = 5;
-- 3 Update data from another transaction
db=# UPDATE users SET experience = 100 WHERE id = 5;
-- 4 Update data from within the transaction
transaction-1=*# UPDATE users SET experience = 0 where id = 5;
ERROR: could not serialize access due to concurrent update
Serializable
Reproduce a serialize error
-- Reset data
UPDATE users SET experience = id * 100;
UPDATE 5
SELECT * FROM users ORDER BY id;
id | username | experience
----+--------------------+------------
1 | irritable_peregrin | 100
2 | shocked_smaug | 200
3 | devastated_saruman | 300
4 | bored_osgiliath | 400
5 | troubled_bagginses | 500
(5 rows)
Lets try to reproduce the error. The locks are checked from ouside any transaction.
-- 1 Start transaction
transaction-1=# BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
-- 2 Start concurrent transaction
transaction-2=# BEGIN TRANSACTION ISOLATION level SERIALIZABLE;
-- 3 Run update on transaction 1
transaction-1=# UPDATE users SET experience = 0;
-- 4 Check locks
SELECT * FROM pg_locks WHERE relation = 171313;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+------------------+---------+----------+-----------
relation | 171304 | 171313 | | | | | | | | 74/185 | 921917 | RowExclusiveLock | t | t |
relation | 171304 | 171313 | | | | | | | | 74/185 | 921917 | SIReadLock | t | f |
(2 rows)
-- 5 Run update on transaction 2
transaction-1=# UPDATE users SET experience = experience * 2;
-- This hangs
-- 6 Check locks
SELECT * FROM pg_locks WHERE relation = 171313;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+------------------+---------+----------+-----------
relation | 171304 | 171313 | | | | | | | | 68/106 | 919215 | RowExclusiveLock | t | t |
relation | 171304 | 171313 | | | | | | | | 74/185 | 921917 | RowExclusiveLock | t | t |
tuple | 171304 | 171313 | 0 | 22 | | | | | | 68/106 | 919215 | ExclusiveLock | t | f |
relation | 171304 | 171313 | | | | | | | | 68/106 | 919215 | SIReadLock | t | f |
relation | 171304 | 171313 | | | | | | | | 74/185 | 921917 | SIReadLock | t | f |
(5 rows)
-- These have now increased
-- 7 Commit transaction 1
transaction-1=# COMMIT;
-- 8 It is seen that transaction 2 fails with
ERROR: could not serialize access due to concurrent update
-- 9 Check locks again
SELECT * FROM pg_locks WHERE relation = 171313;
-- This time nothing is found
If transaction 1 is rolled back, transaction 2 update will succeed.