Because the WHERE
condition of the query involves only equality checks:
WHERE "songs"."processed" = 't'
AND "songs"."working" = 't'
and then you have:
SELECT DISTINCT ON (songs.rank, songs.shared_id) ...
which is similar to GROUP BY songs.rank, songs.shared_id
I would first try adding a compound index on (first the columns in WHERE
, then the columns in DISTINCT ON
):
(processed, working, rank, shared_id)
The ordering: ORDER BY rank DESC
may be better optimized if you have the index as:
(processed, working, rank DESC, shared_id)
Not really sure if this would contribute to efficiency but you can test.
Addition by @Erwin
As per request in comment
In principal (default) b-tree indexes can be scanned forward and backward at the same speed. But sorting can make a difference in multi-column indexes where you combine the sort order of multiple columns. The query starts with:
SELECT DISTINCT ON (songs.rank, songs.shared_id)
In combination with ORDER BY rank DESC
this dictates that the result be ordered by rank DESC, shared_id
effectively. After the (simplified) WHERE clause WHERE processed AND working
has been applied and before LIMIT
can be applied.
I have my doubts if the DISTINCT
clause is actually useful. But while it is there, the optimal index for the query should be (just as @ypercube suspected):
CREATE INDEX songs_special_idx
ON songs (processed, working, rank DESC, shared_id);
Looks like one of the rare cases where explicit ordering of index columns would benefit the query. There is an excellent explanation in the chapter Indexes and ORDER BY of the manual.
If the WHERE condition is stable (always WHERE processed AND working
), a partial multi-column index would be smaller and faster, yet:
CREATE INDEX songs_special_idx
ON songs (rank DESC, shared_id)
WHERE processed AND working;
In a GiST index, the order of columns has a different significance than in a B-tree index. Per documentation:
A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.
In short: put the most selective columns first.
Your EXPLAIN
output shows that the condition on pid
is more selective (rows=7836
) than the one on outline
(rows=63112
). If that can be generalized (a single example may be misleading) I suggest this alternative:
CREATE INDEX inventory_compound_idx ON portal.inventory USING gist (pid, outline);
If most of your (important) queries include conditions on both columns, a multicolumn index may serve you well. Else, single columns may be better overall.
Table layout
This is an educated guess since I don't have complete information.
Don't use oid
as column name. It's easy to confuse with the OID
.
Don't use the name date
for a timestamp column. Or rather: don't use the name date
for any column, don't use names of base-types for identifiers at all. Can lead to confusing mistakes and error messages.
Create a lookup table for types and only put a small integer type_id
into the big table. Pack fixed-length types tightly so not to waste space to padding. Details.
I prefer the type text
(or varchar
without length limit) over varchar(n)
. Details.
For example:
CREATE TABLE portal.inventory (
inventory_id bigint PRIMARY KEY
,type_id integer NOT NULL REFERENCES inv_type(type_id)
,pid integer NOT NULL
,size bigint NOT NULL
,ts timestamp NOT NULL
,outline geography(Polygon,4326)
,product_name text
,path text
);
Best Answer
Given your information about the distribution of the data, the following execution plan will be the best:
Use an index scan to get the 1 to 10 rows from
other table
.Perform a nested loop join with
main_table
as inner table, which will be fast if the join condition is indexed.So the ideal index on
other_table
would beIf
deleted_at
is usuallyNULL
, you can omit theWHERE
clause without losing much.If you want to get an index only scan on
other_table
(which probably isn't necessary, since you only fetch very few rows), you could insteadThe ideal index on
main_table
would beThe order of the columns doesn't matter in this case, because you will scan for both columns with the
=
operator.In the real world, you will try to pick indexes that can be useful for as many queries as possible, because having too many indexes hurts performance and wastes space.