Postgresql – Row-level subquery caching

postgresql

I have a query that uses a CTE which contains millions of rows. I plan to call this query many times, and the CTE returned each time will have most of the same rows. Is it possible for me to cache the CTE somehow so that only new rows need to be calculated?

The (slightly simplified) query is:

WITH vals AS (
    SELECT '2013-08-01 0:00'::timestamp  AS frame_start,
           '2013-09-01 0:00'::timestamp  AS frame_end,
           '1 day'::interval             AS interval_length
),   intervals AS (
    SELECT tsrange(start_time,
                   lead(start_time, 1, frame_end) OVER (ORDER BY start_time     NULLS FIRST)) AS time_range
    FROM (
        SELECT generate_series(frame_start, frame_end, interval_length) AS start_time,
               frame_end
        FROM vals
    ) _
    WHERE start_time < frame_end
), market_trades_ts AS (
    SELECT time_range, td.id
    FROM intervals i
    LEFT JOIN market_trades td
    ON td.timestamp >= COALESCE(lower(i.time_range), '-infinity') AND td.timestamp < COALESCE(upper(i.time_range), 'infinity')
)
SELECT time_range, count(*) AS agg
FROM market_trades_ts td
GROUP BY time_range
ORDER BY time_range;

It would be great if market_trades_ts could be cached so that, for any intervals which it's seen before before, it can pull the previous result set, and then take its union with the new rows (for intervals which it hasn't seen before).

Is this possible? It seems like it will speed up my query dramatically.

Best Answer

PostgreSQL's CTE's materialize results - they run the CTE term once and cache the output for the duration of the query. So it's already doing what you want.

CTE results cannot be cached between queries. If you want that, you should instead CREATE TEMPORARY TABLE AS SELECT ....

Related Solutions

Postgresql – Retrieving queries when the hour of starting timestamp is less than the hour of ending timestamp

As Erwin Brandstetter pointed out, cross post from stackoverflow.com/q/12379598/939860. Solved.

Postgresql – Postgres: “Pivot” unioned table based on one column

If it is faster you have to measure for yourself. However, doing it on the db side sends less data across, so I would assume it to be faster.

The pivot itself is fairly simple. I put your query's result in a table to make the example simpler.

SQL Fiddle

PostgreSQL 9.1.9 Schema Setup:

CREATE TABLE your_query
    ("rel_id" int, "timestamp" timestamp, "y" varchar(1))
;

INSERT INTO your_query
    ("rel_id", "timestamp", "y")
VALUES
    (1, '2013-01-01 00:00:00', 'a'),
    (1, '2013-01-02 00:00:00', 'b'),
    (1, '2013-01-03 00:00:00', 'c'),
    (1, '2013-01-04 00:00:00', 'd'),
    (2, '2013-01-01 00:00:00', 'e'),
    (2, '2013-01-04 00:00:00', 'f'),
    (2, '2013-01-06 00:00:00', 'g')
;

First step is to only return one row per date. That is simply done with a group by:

Query 1:

SELECT timestamp
  FROM your_query
 GROUP BY timestamp
 ORDER BY timestamp

Results:

|                      TIMESTAMP |
|--------------------------------|
| January, 01 2013 00:00:00+0000 |
| January, 02 2013 00:00:00+0000 |
| January, 03 2013 00:00:00+0000 |
| January, 04 2013 00:00:00+0000 |
| January, 06 2013 00:00:00+0000 |

Now wee need to pull the "correct" value into each column. For that we combine an aggregate with a case. The case returns null for all rows for which the condition is not met. the aggregate ignores nulls. That leaves the one value we are looking for:

Query 2:

SELECT timestamp,
       MAX(CASE WHEN rel_id = 1 THEN y END ) AS "1",
       MAX(CASE WHEN rel_id = 2 THEN y END ) AS "2"
  FROM your_query
 GROUP BY timestamp
 ORDER BY timestamp

Results:

|                      TIMESTAMP |      1 |      2 |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 |      a |      e |
| January, 02 2013 00:00:00+0000 |      b | (null) |
| January, 03 2013 00:00:00+0000 |      c | (null) |
| January, 04 2013 00:00:00+0000 |      d |      f |
| January, 06 2013 00:00:00+0000 | (null) |      g |

To make this work with your original query just replace your_query in my example with

(
(SELECT rel_id, timestmap, y FROM table_1 AS full_
WHERE full_.timestamp BETWEEN %s AND %s
ORDER BY full_.timestamp)

UNION ALL

(SELECT rel_id, timestamp, y FROM table_2 AS full_
WHERE full_.timestamp BETWEEN %s AND %s
ORDER BY full_.timestamp)

UNION ALL

...
) AS your_query

Best Answer

Related Solutions

Postgresql – Retrieving queries when the hour of starting timestamp is less than the hour of ending timestamp

Postgresql – Postgres: “Pivot” unioned table based on one column

Related Question