PostgreSQL – Efficient Search for Records with Associations

foreign keyperformancepostgresqlquery-performance

I'm using Postgres 9.5. I want to select some records, but only if they have at least one association in another table. I came up with this query:

select distinct m.name, m.id 
from machines m, parts rt 
where m.id = rt.machine_id 
  and m.name like '%Name%';

but I don't think this is particularly efficient, since if there are 50000 associations, I think they're all loaded and then "distinct"ed through, which seems a little inefficient. Is there a more efficient way to check load a record but only if it has one or more associations?

Best Answer

The way your query is structured, you can just write it using an EXISTS expression:

SELECT
    m.name, m.id 
FROM
    machines m
WHERE
    m.name LIKE '%Name%'
    AND EXISTS (SELECT * 
                  FROM parts rt
                 WHERE rt.machine_id = m.id
               ) ;

If you want the query to be efficient, take into account that the most difficult part will probably be checking the machine name. Consider indexing machines.name using a trigram index. Trigram indices allow for fast finding LIKE '%something%' conditions.

CREATE EXTENSION pg_trgm;  -- Only needed if extension not already installed
CREATE INDEX name_like_idx 
    ON machines 
    USING GIST (name gist_trgm_ops);

Most probably, the database planner will make an execution plan where it will first find the machines whose name is LIKE '%Name%' (using the name_like_idx index), then, for every machine in this subset, it will check if there are parts associated with them; if so, they will be part of the result.

For efficiency, it is also assumed that there is also an index on parts.machine_id, because this will then be how PostgreSQL will efficiently check for the existence of the part. If machine_id is a foreign key, it is always advised to index it.

This query doesn't need DISTINCT (assuming that the ids are already unique).

Related Solutions

Sql-server – Which is the most efficient way to run a particular select query

I'd probably use

SELECT TOP (1) WITH TIES PolicyNumber, 
                         decpageid, 
                         Risk 
FROM   StatRiskDecpages 
WHERE  PolicyNumber = 'AR-0000301132-04' 
ORDER  BY decpageid DESC

Assuming the covering index on (PolicyNumber, decpageid) INCLUDE(Risk) this will give you a plan like

enter image description here

SQL Server Query Performance – Efficient Ways to Run First/Max Transaction Query

A slight variation on your's and Thomas's that will do it with a single index scan.

WITH FlagFirstMax AS 
    (
    SELECT
        Account_ID,
        Tran_Date,
        Tran_Amount,
        ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Tran_Date ASC) isFirst,
        ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Tran_Amount DESC, Tran_Date ASC) isMax
    FROM dbo.trans
    WHERE Tran_Amount > 0
    )
SELECT Account_ID,
    MAX(CASE WHEN isFirst = 1 THEN Tran_Date END) First_Tran_Date,
    DATEDIFF(
        day,
        MAX(CASE WHEN isFirst = 1 THEN Tran_Date END),
        MAX(CASE WHEN isMax = 1 THEN Tran_Date END)
        ) Day_Duration,
    MAX(CASE WHEN isMax = 1 THEN Tran_Amount END) Max_Tran_Amount
FROM FlagFirstMax
GROUP BY Account_ID;

Best Answer

Related Solutions

Sql-server – Which is the most efficient way to run a particular select query

SQL Server Query Performance – Efficient Ways to Run First/Max Transaction Query

Related Question