PostgreSQL Views – Writing a View that Caches to Another Table

aggregateinsertplpgsqlpostgresql

I have a table market_trades with columns timestamp, market_id, and amount:

api_production=# \d market_trades;
                                       Table "public.market_trades"
     Column     |            Type             |                         Modifiers                          
----------------+-----------------------------+------------------------------------------------------------
 id             | integer                     | not null default nextval('market_trades_id_seq'::regclass)
 market_id      | integer                     | not null
 timestamp      | timestamp without time zone | not null
 amount         | numeric(16,8)               | not null

And I just created a table market_trades_sum_cache:

api_production=# \d market_trades_sum_cache;
                                       Table "public.market_trades_sum_cache"
     Column     |            Type             |                         Modifiers                          
----------------+-----------------------------+------------------------------------------------------------
 id             | integer                     | not null default nextval('market_trades_sum_cache_id_seq'::regclass)
 market_id      | integer                     | not null
 start_time     | timestamp without time zone | not null
 end_time       | timestamp without time zone | not null
 sum            | numeric(16,8)               | not null

I want to write a view (or some other type of reusable function) called trade_volume that takes a market_id, start_time, and end_time as parameters and does the following:

  1. If there is a row in market_trades_sum_cache with the given market_id, start_time, and end_time, return the corresponding sum.
  2. Otherwise, add up the amounts of all rows in market_trades with the given market_id and whose timestamp is between start_time and end_time, store that sum as a new row in market_trades_sum_cache, and return the sum.

Is this possible in PostgreSQL 9.3?

BONUS: Use PostgreSQL's timestamp range types (tsrange in PostgreSQL 9.2+) instead of start_time and end_time in market_trades_sum_cache.

Best Answer

CREATE OR REPLACE FUNCTION trade_volume (_market_id  int
                                       , _start_time timestamp
                                       , _end_time   timestamp)
  RETURNS SETOF numeric(16,8) AS
$func$
BEGIN

RETURN QUERY
SELECT sum                          -- dubious identifier
FROM   market_trades_sum_cache
WHERE  market_id  = _market_id
AND    start_time = _start_time
AND    end_time   = _end_time;

IF NOT FOUND THEN
   RETURN QUERY
   INSERT INTO market_trades_sum_cache(market_id, start_time, end_time, sum)
   SELECT market_id
        , _start_time
        , _end_time
        , sum(amount)
   FROM   market_trades
   WHERE  market_id = _market_id
   AND    timestamp BETWEEN _start_time AND _end_time
   GROUP  BY 1
   RETURNING sum;
END IF;

END
$func$ LANGUAGE plpgsql VOLATILE;

A couple of warnings

  • It's a bad idea to use names of basic functions (sum) or data types (timestamp) as column names.

  • BETWEEN includes lower and upper bound. Often, you would rather include the lower, but exclude the upper bound.

  • There is a tiny chance for a race condition in the above function. To be prepared for heavy concurrent load, consider this related answer:

Alternative with tsrange

...
AND    ts_range = tsrange(_start_time, _end_time, '[]')
...
   INSERT INTO market_trades_sum_cache(market_id, ts_range, sum)
   SELECT market_id
        , tsrange(_start_time, _end_time, '[]')  -- incl. bounds
        , sum(amount)
   FROM   market_trades
   WHERE  market_id = _market_id
   AND    timestamp BETWEEN _start_time AND _end_time
   GROUP  BY 1
   RETURNING sum;
...

I would consider a multicolumn GIN or GiST index for the range type. Though, if you only check for equality, a btree index would do the job, too. But the latter isn't much use for anything else. Details in the manual.

To include market_id in the index (like I'd suggest), you also need the additional module btree_gin or btree_gist. Then create the index:

CREATE INDEX sum_mult_idx ON market_trades_sum_cache USING GIN (market_id, ts_range, sum);

The last column sum is optional and only makes sense if you get index-only scans out of it. Else don't include it in the index.

Or, if you want to disallow overlapping time ranges, use an exclusion constraint, based on a GiST index (GIN is not currently possibly). Details:

The last one also provides a clean solution for enforcing '[)' boundaries.