PSQL vacuuming review
Psql ensures transaction isolation with the multi-version concurrency control (MVCC) technique. To make MVCC work, delete operations can not remove the data from the database, instead it marks the affected record as dead. A similar thing happens for updates. When a record is updated, what really happens is that the older record is marked as dead and a new entry is created. This will happen for each update, so if a single record is updated twice, there will be two dead records and one alive record.
Small remark, in this context it is common to use the term tuple. Per psql docs, a tuple is defined as an individual state of a row.
The process which actually deletes the dead tuples is called vacuuming. VACUUM inspects the whole relation removing dead tuples and their index entries. Removing these tuples does not return a lot of memory to the system. The reason is that psql will only return memory to the operating system if the number of empty pages at the end of the table goes above certain threshold. Since vacuuming does not reorder tuples inside the page, it is very unlikely that a lot of memory is released. Note that vacuuming does not block read or write operations on the relation.
The process of vacuuming can be enabled to run on a periodic manner. Psql will calculate periodically which tables need to be vacuumed and launch the process. The decision is based on the number of inserted, updated and deleted tuples in each relation. Therefore, track_counts must be enabled.
To release all memory not in use there is the command FULL VACUUM. This command recreates the given table and its indexes without any empty space in their pages. So this command liberates more memory but it is also slower to run. While the tables and indexes are been recreated, the data is duplicated in memory, so it might require quite some free memory. Also this process blocks the given relations from been read or written.
See table stats
SELECT
relname, -- Table name
n_dead_tup, -- Number of dead tuples
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables;
See table size
SELECT
relname AS "table_name",
pg_size_pretty(pg_table_size(C.oid)) AS "table_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' AND relkind IN ('r')
ORDER BY pg_table_size(C.oid);
Data density
Use pgstattuple to see data density stats:
CREATE EXTENSION pgstattuple;
-- See statistics on space distribution in files for a given table
SELECT * from pgstattuple('table_name');
-- See space distribution statistics for indexes
SELECT * FROM pgstatindex('index_name');
The pgstattuple and pgstatindex methods can be slow. A more efficient method is pgstattuple_approx, however, the results are just an approximation.