PostgreSQL – Efficient Search for Records with Associations

foreign keyperformancepostgresqlquery-performance

I'm using Postgres 9.5. I want to select some records, but only if they have at least one association in another table. I came up with this query:

select distinct m.name, m.id 
from machines m, parts rt 
where m.id = rt.machine_id 
  and m.name like '%Name%';

but I don't think this is particularly efficient, since if there are 50000 associations, I think they're all loaded and then "distinct"ed through, which seems a little inefficient. Is there a more efficient way to check load a record but only if it has one or more associations?

Best Answer

The way your query is structured, you can just write it using an EXISTS expression:

SELECT
    m.name, m.id 
FROM
    machines m
WHERE
    m.name LIKE '%Name%'
    AND EXISTS (SELECT * 
                  FROM parts rt
                 WHERE rt.machine_id = m.id
               ) ;

If you want the query to be efficient, take into account that the most difficult part will probably be checking the machine name. Consider indexing machines.name using a trigram index. Trigram indices allow for fast finding LIKE '%something%' conditions.

CREATE EXTENSION pg_trgm;  -- Only needed if extension not already installed
CREATE INDEX name_like_idx 
    ON machines 
    USING GIST (name gist_trgm_ops);

Most probably, the database planner will make an execution plan where it will first find the machines whose name is LIKE '%Name%' (using the name_like_idx index), then, for every machine in this subset, it will check if there are parts associated with them; if so, they will be part of the result.

For efficiency, it is also assumed that there is also an index on parts.machine_id, because this will then be how PostgreSQL will efficiently check for the existence of the part. If machine_id is a foreign key, it is always advised to index it.

This query doesn't need DISTINCT (assuming that the ids are already unique).