PSQL Multiversion Concurrency Control (MVCC)
Psql concurrency control ensures data consistency for concurrent processes. This is done with a multiversion method, a non-locking method.
The psql multiversion conccurrency control (mvcc) approach is to show individual snapshots to each transaction, providing them of transaction isolation. This means that transactions are not affected by other concurrent transactions. Also, because it is a non-locking method reads do not block writes and writes do not block reads.
How it works
Each transaction is identified with an id (xid), dispatched by a counter. This id is assigned to the transaction when the first write operation of the transaction is performed.
SELECT txid_current();
Each tuple in the database stores the transaction id in which it was created (xmin) and the transaction id in which it was deleted (xmax).
SELECT *, xmin, xmax FROM <table-name>;
Therefore a transaction will only see those commited rows for which xmin <= xid >= xmax. (Note: if a row is not commited, ie it belongs to a concurrent transaction it wont be seen, more on this on a later post).
Dealing with updates
An update on psql works like a delete plus an insert. A new row is created with all the updated values and an xmin equal to the given transaction id. The outdated tuple xmax is set to the transaction xid.
This is sometimes referred as an append only method.
Physical rows lineage
The lineage between updated rows is maintained by the system column ctid. A tuple ctid points to the physical space where the next updated tuple is found. The ctid value is composed by the block number and the offset. Its values can be seen with pageinspect extension:
CREATE extension pageinspect;
SELECT * FROM heap_page_items(get_raw_page('<table-name>',0));
All tuples will have a t_ctid value. If the tuple t_ctid points to itself, it means that this is the tuple latest version.
Impact on indexes
According to psql documentation, an index might not be updated if:
- If the updated data does not affect any indexed column, except for summarizing indexes
- The new row can be added to the same page where the old row is located
Lets check this.
Analyzing physical records
Suppose that there is a table of items which relates each item to a given user (this would be better done with an association table, but this is skipped for brevity):
CREATE TABLE users (
id serial PRIMARY KEY,
username varchar(255)
);
CREATE TABLE items (
id serial PRIMARY KEY,
name varchar(255),
color varchar(255),
user_id int REFERENCES users(id)
);
CREATE INDEX ON items(user_id);
Lets add some data:
INSERT INTO users (username)
VALUES
('irritable_peregrin'),
('shocked_smaug'),
('devastated_saruman'),
('bored_osgiliath'),
('troubled_bagginses');
INSERT INTO items (name, color, user_id)
VALUES
('premium_chinchilla', 'red', 1),
('vast_bowl', 'blue', 1),
('righteous_banana', 'green', 1),
('thirsty_noddle', 'yellow', 1),
('obeisant_shenanigan', 'orange', 1),
('friendly_bulbous', 'purple', 1),
('agile_truffles', 'pink', 1),
('qualified_schnitzel', 'black', 1),
('grey_gubbins', 'grey', 1),
('steady_doodle', 'white', 1),
('premium_chinchilla', 'red', 2),
('vast_bowl', 'blue', 2),
('righteous_banana', 'green', 2),
('thirsty_noddle', 'yellow', 2),
('obeisant_shenanigan', 'orange', 2),
('friendly_bulbous', 'purple', 2),
('agile_truffles', 'pink', 2),
('qualified_schnitzel', 'black', 2),
('grey_gubbins', 'grey', 2),
('steady_doodle', 'white', 2),
('premium_chinchilla', 'red', 3),
('vast_bowl', 'blue', 3),
('righteous_banana', 'green', 3),
('thirsty_noddle', 'yellow', 3),
('obeisant_shenanigan', 'orange', 3),
('friendly_bulbous', 'purple', 3),
('agile_truffles', 'pink', 3),
('qualified_schnitzel', 'black', 3),
('grey_gubbins', 'grey', 3),
('steady_doodle', 'white', 3),
('premium_chinchilla', 'red', 4),
('vast_bowl', 'blue', 4),
('righteous_banana', 'green', 4),
('thirsty_noddle', 'yellow', 4),
('obeisant_shenanigan', 'orange', 4),
('friendly_bulbous', 'purple', 4),
('agile_truffles', 'pink', 4),
('qualified_schnitzel', 'black', 4),
('grey_gubbins', 'grey', 4),
('steady_doodle', 'white', 4),
('premium_chinchilla', 'red', 5),
('vast_bowl', 'blue', 5),
('righteous_banana', 'green', 5),
('thirsty_noddle', 'yellow', 5),
('obeisant_shenanigan', 'orange', 5),
('friendly_bulbous', 'purple', 5),
('agile_truffles', 'pink', 5),
('qualified_schnitzel', 'black', 5),
('grey_gubbins', 'grey', 5),
('steady_doodle', 'white', 5);
Lets inspect the many physical entries have been created on the items table and in the index in the user_id column:
CREATE EXTENSION pageinspect;
-- Check table number of pages
SELECT count(*)
FROM heap_page_items(get_raw_page('items', 0));
-- Check indexes
SELECT count(*)
FROM bt_page_items('items_user_id_idx', 1);
Each of these queries return 50 results. One entry for each item.
Update not indexed column
Now lets update the color of the first item:
UPDATE items
SET color = 'gold' WHERE id = 1;
And lets inspect how many physical tuples there are:
SELECT t_ctid, lp, lp_len, lp_flags, t_xmin, t_xmax
FROM heap_page_items(get_raw_page('items', 0));
t_ctid | lp | lp_len | lp_flags | t_xmin | t_xmax
--------+----+--------+----------+---------+---------
(0,51) | 1 | 56 | 1 | 1349408 | 1351914
...
(0,51) | 51 | 56 | 1 | 1351914 | 0
(51 rows)
This query now returns 51 results (for brevity I have removed all the unchaged results). Note how the t_ctid of the first result points to the last result. Also, note that the value of the first row t_xmax matches the value of last row t_xmin. This is exactly as expeted, first tuple was soft deleted and a new tuple was inserted.
Lets take a look at the index physical entries:
SELECT count(*)
FROM bt_page_items('items_user_id_idx', 1);
Still returns 50 results. No changes.
Update on indexed column
Lets now move one item from one user to another:
UPDATE items
SET user_id = 5 WHERE id = 1;
Lets see the impact on the table physical tuples:
SELECT t_ctid, lp, lp_len, lp_flags, t_xmin, t_xmax
FROM heap_page_items(get_raw_page('items', 0));
t_ctid | lp | lp_len | lp_flags | t_xmin | t_xmax
--------+----+--------+----------+---------+---------
(0,51) | 1 | 56 | 1 | 1349408 | 1351914
...
(0,52) | 51 | 56 | 1 | 1351914 | 1351916
(0,52) | 52 | 56 | 1 | 1351916 | 0
Again a new tuple has been inserted and the former latest vrsion tuple t_xmax and t_ctid values have been updated.
Lets inspect the index entries:
SELECT itemoffset, ctid, htid, itemlen, nulls, vars, dead
FROM bt_page_items('items_user_id_idx', 1) ORDER BY itemoffset desc;
itemoffset | ctid | htid | itemlen | nulls | vars | dead
------------+--------+--------+---------+-------+------+------
51 | (0,52) | (0,52) | 16 | f | f | f
50 | (0,50) | (0,50) | 16 | f | f | f
...
(51 rows)
There is a new entry for the new tuple. Notice how there is a difference between the itemoffset and the ctid offset. This is because the tuple 51 did not introduce a new element in the index. Remember that this tuple was the result of updating the color, a not indexed column, on one of the items.
Note how the lp_flags of all the tuples has a value of 1, even though some tuples are not visible to any new transactions.
Vacuum
Lets vacuum the items table:
VACUUM items;
And lets see its impact:
SELECT t_ctid, lp, lp_len, lp_flags, t_xmin, t_xmax
FROM heap_page_items(get_raw_page('items', 0));
t_ctid | lp | lp_len | lp_flags | t_xmin | t_xmax
--------+----+--------+----------+---------+--------
| 1 | 0 | 0 | |
...
(0,50) | 50 | 52 | 1 | 1349408 | 0
| 51 | 0 | 0 | |
(0,52) | 52 | 56 | 1 | 1351916 | 0
(52 rows)
There are still pointers for the allocated page entries, but are marked as unused (lp_flags = 0) and ready to be reused.
SELECT itemoffset, ctid, htid, itemlen, nulls, vars, dead
FROM bt_page_items('items_user_id_idx', 1) ORDER BY itemoffset desc;
itemoffset | ctid | htid | itemlen | nulls | vars | dead
------------+--------+--------+---------+-------+------+------
50 | (0,52) | (0,52) | 16 | f | f | f
49 | (0,50) | (0,50) | 16 | f | f | f
...
(50 rows)
The index entries pointing to dead tuples have been removed from the index.