Also consider the first answer.
Query
This does what your current query currently does, just simpler and faster:
SELECT p.id, p.note, p.photo, p."createdAt",
u.id AS "user.id",
u.name AS "user.name",
h.id AS "hashtags.id",
h.count AS "hashtags.count",
h.name AS "hashtags.name",
h."createdAt" AS "hashtags.createdAt",
h."updatedAt" AS "hashtags.updatedAt",
h."objectId" AS "hashtags.objectId",
h._etl AS "hashtags._etl",
ph.id AS "hashtags.PostHashtag.id",
ph."createdAt" AS "hashtags.PostHashtag.createdAt",
ph."updatedAt" AS "hashtags.PostHashtag.updatedAt",
ph."postId" AS "hashtags.PostHashtag.postId",
ph."hashtagId" AS "hashtags.PostHashtag.hashtagId",
ph."objectId" AS "hashtags.PostHashtag.objectId",
ph._etl AS "hashtags.PostHashtag._etl"
FROM (
SELECT id, note, photo, "createdAt", "userId"
FROM "Posts" p
WHERE photo IS NOT NULL
AND EXISTS (
SELECT 1
FROM "PostHashtags" ph
WHERE ph."postId" = p.id
)
ORDER BY p."createdAt" DESC
LIMIT 10
) p
JOIN "PostHashtags" ph ON ph."postId" = p.id
JOIN "Hashtags" h ON h.id = ph."hashtagId"
JOIN "Users" u ON u.id = p."userId"
WHERE h.name = 'dead'
ORDER BY p."createdAt" DESC;
The EXISTS
semi-join should be faster than your subquery construct. I am assuming that the column "PostHashtags".id
is the PK and cannot be NULL by itself. Also, if referential integrity is enforced by a FK constraint, no need to join to "Hashtags"
in this test.
Indexes
Partial index on Posts
CREATE INDEX posts_foo_idx ON "Posts" ("createdAt", id)
WHERE photo IS NOT NULL;
Note the columns: ("createdAt", id)
. Postgres will the latest posts, I expect an index scan on posts_foo_idx
from the top, followed by a test for matching entries in PostHashtags
using id
with the next index.
UNIQUE index on PostHashtags
This time we need the index with "postId"
first.
The rest is mostly like in the first answer.
If you solve it by adding a secondary index then imho MyISAM or InnoDB won't matter much, as both use B-Tree indexes.
Append both additional columns to the index: (col3, col1, col2)
or (col3, col2, col1)
should behave the same for your query.
The reason to add all the columns is that such index covers your query - all data needed to resolve the query are contained inside the index, so it is not needed to read the actual rows from the table. That means the data you want to fetch are grouped together on disk (the index is ordered/clustered by col3
) and no random reads to the "main" table are needed.
If you had index on only (col3)
then each row may be stored in a different part of the table (depending on your primary key or order of inserts) and it would randomly jump all over the place, incurring IO penalty.
If you have no primary key, then MyISAM would be probably better, as InnoDB would create internal primary key and the table would take more space than needed.
But there is a possibility to solve it without using any secondary key. Use InnoDB table like
create table t (
col1 varchar(5),
col2 varchar(5),
col3 varchar(32) not null,
col4 smallint unsigned not null auto_increment,
primary key(col3, col4),
key(col4)
) engine = innodb;
That way the table will use InnoDB feature of clustering by primary key - effectively forcing the optimal order and grouping of rows described previously. And it will take just a small bit more space for the additional column (2 bytes only, if you are positive there will always be less than 60k rows) and much smaller secondary key to accomodate the auto_increment
behavior instead of effectively having 2 copies of the table as with the other version.
Example: http://sqlfiddle.com/#!9/c36cfc/2
Best Answer
PostgreSQL certainly can use an index for
IS NOT NULL
. I don't see any query planner assumptions about that condition, either.If the null fraction for the column (
pg_statistic.stanullfrac
) is low enough to suggest that the index is usefully selective for the query, PostgreSQL will use an index.I can't figure out what you're trying to say with:
Certainly an index won't get used for an
IS NOT NULL
condition on aNOT NULL
column. It'd always match 100% of rows, so a seqscan will almost always be much faster.PostgreSQL won't use an index if the index doesn't filter out a large proportion of rows for a query. The only likely exception is when you're asking for a set of columns covered by a single index, in an order matching that of the index. PostgreSQL might do an index-only scan then. E.g. if there's an index on
t(a, b, c)
and you:PostgreSQL might use your index, even though no rows are filtered out, because it only has to read the index and can skip reading the heap, avoid doing a sort, etc.