PostgreSQL – Join vs Virtual Table in Query Performance

foreign keyperformancepostgresqlquery-performancesqlite

Suppose I have two tables linked by a foreign-key

example DDL's

CREATE TABLE raw_collection (
    info_datetime DATETIME NOT NULL, 
    asset_id INTEGER NOT NULL, 
    price FLOAT, 
    PRIMARY KEY (info_datetime, asset_id), 
    FOREIGN KEY(asset_id) REFERENCES assets (id)
);

CREATE TABLE assets (
    id INTEGER NOT NULL, 
    asset_symbol VARCHAR, 
    PRIMARY KEY (id)
);

when filtering by asset_symbol on the raw_collection tables, what (and why) would be a faster solution. using a join or creating an in memory table from the asset_symbol table and filtering raw_collection according to it?


the queries

-- query 1
with needed_assets as (
  select id 
  from assets
  where asset_symbol in ('a', 'b'))
select *
from raw_collection
where asset_id in (select id from needed_assets) ;
-- where asset_id in (table needed_assets) ;  -- Postgres only syntax
-- where asset_id in needed_assets ;          -- SQLite only syntax

-- query 2
select r.*
from raw_collection r join assets a on r.asset_id = a.id 
where a.asset_symbol in  ('a', 'b') ;

I am either using PostgreSQL 11 or SQLite 3.28.0

Best Answer

It depends on a number of things. You list postgres in one of your tags, so it should be noted that the CTE will be materialised in full as CTEs are an optimisation fence that block predicate push-down. I'm not sure about sqlite, the other DB you list in the tags, but this is not the case for all databases, SQL Server for one can optimise across CTEs.

So for postgres currently, all other things being equal and appropriate indexes being present, I would expect query 2 to be faster if there is enough data in assets matching asset_symbol in ('a', 'b')) for the difference to be noticeable.

In either case what indexes you have defined will also make a difference. Two things to note in particular are that marking raw_collection.asset_id as a foreign key does not automatically create an index for that column which might limit the query planner's choices, and if you don't have one already then an index covering assets.asset_symbol is likely to be useful unless that table has very few rows.

Note: this is due to change in version 12, which is slated for release soon, though as you state you are using v11 this is not relevant to you yet