Psql generate series notes
Lets assume we want to evaluate the use of a new features. For this we store events of each feature. Later we want to generate a report of event counts per feature.
Generating the data
Create and connect to database
CREATE DATABASE testing_db;
\c testing_db
Create tables with some existing data
CREATE TABLE feature_a_events
AS SELECT
i as id,
now() - '1 week'::INTERVAL * random() AS timestamp,
floor(random()*(10-1+1))+1 AS time_costs
FROM
generate_series(1, 1000) i;
CREATE TABLE feature_b_events
AS SELECT
i as id,
now() - '1 week'::INTERVAL * random() AS timestamp,
floor(random()*(10-1+1))+1 AS time_costs
FROM
generate_series(1, 1000) i;
The commands above will generate data for all days on the week ending today.
Users will not use all feature every day. Lets remove data for certain days.
DELETE FROM feature_a_events
WHERE EXTRACT(ISODOW FROM timestamp) IN (6, 7);
DELETE FROM feature_b_events
WHERE EXTRACT(ISODOW FROM timestamp) IN (4, 5, 6);
Generating a time series
For the adoption rate report we would like to see how many events of each feature were found each day. If there was no event, we still want to see an entry for that day.
For this purpose we can generate a time series with generate_series.
WITH feature_a_aggregated_data AS (
SELECT
avg(time_costs) AS avg_time_costs,
count(time_costs) AS total_events_count,
(date_trunc('day', timestamp)) AS event_date
FROM feature_a_events
GROUP BY event_date
), feature_b_aggregated_data AS (
SELECT
avg(time_costs) AS avg_time_costs,
count(time_costs) AS total_events_count,
(date_trunc('day', timestamp)) AS event_date
FROM feature_b_events
GROUP BY event_date
)
SELECT
g.d AS day_spine,
extract(isodow from g.d) AS dow,
a.total_events_count AS a_events_count,
b.total_events_count AS b_events_count
FROM
generate_series(
current_date - interval '7 days',
current_date - interval '1 day',
interval '1 day'
) AS g(d)
LEFT JOIN
feature_a_aggregated_data a ON a.event_date = g.d
LEFT JOIN
feature_b_aggregated_data b ON b.event_date = g.d
ORDER BY g.d DESC;
Note that there is a record for each day in the results, event if there are no events:
day_spine | dow | a_events_count | b_events_count
---------------------+-----+----------------+----------------
2025-02-17 00:00:00 | 1 | 143 | 122
2025-02-16 00:00:00 | 7 | | 144
2025-02-15 00:00:00 | 6 | |
2025-02-14 00:00:00 | 5 | 166 |
2025-02-13 00:00:00 | 4 | 148 |
2025-02-12 00:00:00 | 3 | 158 | 148
2025-02-11 00:00:00 | 2 | 52 | 85
(7 rows)
From clause alias syntax
It seems that for previous versions of psql, aliases were needed for subqueries (link).
While looking up the usage of generate_series I frequently found the following syntax:
SELECT gs.d::date
FROM generate_series(
current_date,
current_date + interval '7 days',
'1 day'
) AS gs(d);
Here the alias is returning a table gs, with column d.
Note that more column aliases could be provided:
SELECT *
FROM (
VALUES (1, 2, 'James'), (4, 5, 'Luka')
) AS my_table(id, user_id, username);
Although, not all columns need an alias:
SELECT *
FROM (
VALUES (1, 2, 'James'), (4, 5, 'Luka')
) AS my_table(id, user_id);
For psql version 16 all the queries below return the same values:
SELECT gs.d::date
FROM generate_series(
current_date,
current_date + interval '7 days',
'1 day'
) AS gs(d);
SELECT d::date
FROM generate_series(
current_date,
current_date + interval '7 days',
'1 day'
) AS d;
SELECT *
FROM generate_series(
current_date,
current_date + interval '7 days',
'1 day'
);