Psql generate series notes

Lets assume we want to evaluate the use of a new features. For this we store events of each feature. Later we want to generate a report of event counts per feature.

Generating the data

Create and connect to database

CREATE DATABASE testing_db;

\c testing_db

Create tables with some existing data

CREATE TABLE feature_a_events 
AS SELECT
    i as id,
    now() - '1 week'::INTERVAL * random() AS timestamp,
    floor(random()*(10-1+1))+1 AS time_costs 
FROM
    generate_series(1, 1000) i;

CREATE TABLE feature_b_events 
AS SELECT
    i as id,
    now() - '1 week'::INTERVAL * random() AS timestamp,
    floor(random()*(10-1+1))+1 AS time_costs 
FROM
    generate_series(1, 1000) i;

The commands above will generate data for all days on the week ending today.

Users will not use all feature every day. Lets remove data for certain days.

DELETE FROM feature_a_events 
WHERE EXTRACT(ISODOW FROM timestamp) IN (6, 7);

DELETE FROM feature_b_events 
WHERE EXTRACT(ISODOW FROM timestamp) IN (4, 5, 6);

Generating a time series

For the adoption rate report we would like to see how many events of each feature were found each day. If there was no event, we still want to see an entry for that day.

For this purpose we can generate a time series with generate_series.

WITH feature_a_aggregated_data AS (
    SELECT 
        avg(time_costs) AS avg_time_costs,
        count(time_costs) AS total_events_count,
        (date_trunc('day', timestamp)) AS event_date
    FROM feature_a_events
    GROUP BY event_date
), feature_b_aggregated_data AS (
    SELECT 
        avg(time_costs) AS avg_time_costs, 
        count(time_costs) AS total_events_count, 
        (date_trunc('day', timestamp)) AS event_date
    FROM feature_b_events
    GROUP BY event_date
)

SELECT
    g.d AS day_spine,
    extract(isodow from g.d) AS dow,
    a.total_events_count AS a_events_count,
    b.total_events_count AS b_events_count 
FROM
    generate_series(
        current_date - interval '7 days',
        current_date - interval '1 day',
        interval '1 day'
    ) AS g(d)
LEFT JOIN
    feature_a_aggregated_data a ON a.event_date = g.d 
LEFT JOIN
    feature_b_aggregated_data b ON b.event_date = g.d 
ORDER BY g.d DESC;

Note that there is a record for each day in the results, event if there are no events:

      day_spine      | dow | a_events_count | b_events_count 
---------------------+-----+----------------+----------------
 2025-02-17 00:00:00 |   1 |            143 |            122
 2025-02-16 00:00:00 |   7 |                |            144
 2025-02-15 00:00:00 |   6 |                |               
 2025-02-14 00:00:00 |   5 |            166 |               
 2025-02-13 00:00:00 |   4 |            148 |               
 2025-02-12 00:00:00 |   3 |            158 |            148
 2025-02-11 00:00:00 |   2 |             52 |             85
(7 rows)

From clause alias syntax

It seems that for previous versions of psql, aliases were needed for subqueries (link).

While looking up the usage of generate_series I frequently found the following syntax:

SELECT gs.d::date
FROM generate_series(
  current_date,
  current_date + interval '7 days',
  '1 day'
) AS gs(d);

Here the alias is returning a table gs, with column d.

Note that more column aliases could be provided:

SELECT *
FROM (
    VALUES (1, 2, 'James'), (4, 5, 'Luka')
) AS my_table(id, user_id, username);

Although, not all columns need an alias:

SELECT *
FROM (
    VALUES (1, 2, 'James'), (4, 5, 'Luka')
) AS my_table(id, user_id);

For psql version 16 all the queries below return the same values:

SELECT gs.d::date
FROM generate_series(
  current_date,
  current_date + interval '7 days',
  '1 day'
) AS gs(d);

SELECT d::date
FROM generate_series(
  current_date,
  current_date + interval '7 days',
  '1 day'
) AS d;

SELECT *
FROM generate_series(
  current_date,
  current_date + interval '7 days',
  '1 day'
);