Using psql triggers to maintain cache life cycle

This is the first use case I have had for psql triggers and it went very well.

Use case

We have a cron job which generates a bunch of reports based on some data in the database. This job runs at night time, so when the users want to see the reports the day after, those are already created and they dont have to wait for them to be created on demand.

The problem: the users are allowed to update their data in the database rendering the report obsolete. On this case we need to delete the report and create it either when the user requests it again or with the next cron job.

This is a cache life cycle problem. We need to make sure that any time the report data is updated the report cache is deleted. There are many different fields going into the report which can be updated by many different api endpoints, with different scripts or even with plain sql commands, this makes the cache handling more difficult. Making sure that in all of those cases the cache is cleared accordingly is very difficult. It also means that each person developing new code or running sql commands needs to be aware of the cache clearance step.

This is the perfect job for a psql trigger. A function which is triggered when changing data on a table.

How it works

When the report is created it is recorded in the database, with the url for downloading the file and it is linked to all the rows containing the data used for the report.

When the user requests the report, the database is queried to find the report url and send the url to the user.

When any field which is used by the report is updated, the trigger will remove the relationship between the business report and its file.

If the user queries the report and there is no relation to the file, the report is created again, recorded and handed over.

Example

Notice that this a very simplified version of the actual system.

Creating some tables

A table with our users data:

CREATE TABLE IF NOT EXISTS users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(64) NOT NULL
);

A table with some data used on the report:

CREATE TABLE IF NOT EXISTS report_metadata(
    id SERIAL PRIMARY KEY,
    title VARCHAR(64),
    created_by INTEGER NOT NULL,
    CONSTRAINT fk_business_reports_users
        FOREIGN KEY (created_by)
        REFERENCES users (id)    
);

A table which tracks the existing files in our s3 bucket:

CREATE TABLE IF NOT EXISTS bucket_entries (
    id SERIAL PRIMARY KEY,
    bucket_url VARCHAR(64),
    created_at TIMESTAMP NOT NULL
);

A table which relates report data and report files. This is the table which is queried when the user requests the business report:

CREATE TABLE IF NOT EXISTS business_reports(
    id SERIAL PRIMARY KEY,
    bucket_entry_id INTEGER,
    metadata_id INTEGER NOT NULL,
    CONSTRAINT fk_business_reports_bucket_entries
        FOREIGN KEY (bucket_entry_id)
        REFERENCES bucket_entries (id),
    CONSTRAINT fk_business_reports_metadata
        FOREIGN KEY (metadata_id)
        REFERENCES report_metadata (id)        
);

Creating the trigger

We need a function to remove the file id from the reports table, when the metadata is updated:

CREATE OR REPLACE FUNCTION clear_cache() 
RETURNS TRIGGER AS
$BODY$
BEGIN
    IF (TG_TABLE_NAME = 'report_metadata') THEN
        IF (TG_OP = 'UPDATE') THEN
            UPDATE business_reports
            SET bucket_entry_id = NULL
            WHERE business_reports.metadata_id = NEW.id;
        END IF;
    ELSIF (TG_TABLE_NAME = 'users') THEN
        IF (TG_OP = 'UPDATE') THEN
            UPDATE business_reports
            SET bucket_entry_id = NULL
            FROM report_metadata
            WHERE created_by = NEW.id;
        END IF;
    END IF;
    RETURN NEW;
END;
$BODY$

Attaching the trigger to the tables

Drop the triggers, in case they already exist and create them again:

DROP TRIGGER IF EXISTS user_cache_updater ON users;
DROP TRIGGER IF EXISTS metadata_cache_updater ON report_metadata;

CREATE TRIGGER user_cache_updater
AFTER INSERT OR UPDATE ON users
FOR EACH ROW
EXECUTE PROCEDURE clear_cache();

CREATE TRIGGER metadata_cache_updater
AFTER INSERT OR UPDATE ON report_metadata
FOR EACH ROW
EXECUTE PROCEDURE clear_cache();

Inserting some values

Lets insert some data to test the triggers:

INSERT INTO users(username) 
VALUES ('John Cena');

INSERT INTO bucket_entries(bucket_url, created_at) 
VALUES ('www.does/not/exist', '2024-05-22');

INSERT INTO report_metadata(title, created_by)
VALUES ('My rreport', 1);

INSERT INTO business_reports(bucket_entry_id, metadata_id)
VALUES (1, 1);

Looking into the business reports table we can see that there is already a file in the s3 bucket.

select * from business_reports;

 id | bucket_entry_id | metadata_id 
----+-----------------+-------------
  1 |               1 |           1

Lets checkout the metadata:

select * from report_metadata;

 id |   title    | created_by 
----+------------+------------
  1 | My rreport |          1

Oh, no the title has a syntax error. The user would like to update the title as soon as he reads the report. Lets update the title:

UPDATE report_metadata
SET title = 'My report'
WHERE id = 1;
UPDATE 1

And lets see if the hook has removed the relationship with the file:

SELECT * FROM business_reports;
 id | bucket_entry_id | metadata_id 
----+-----------------+-------------
  1 |                 |           1
(1 row)

Great the file id has been removed. So the next time the user queries the report it would be generated again with the new metadata and the file id will be updated.

Bonus

What happens to the reports which are unliked from the business_reports table? Well, those are remove at night time with another cron job.