Query:
EXPLAIN (ANALYZE, BUFFERS) SELECT
COUNT (*) AS "count"
FROM
"Posts" AS "Post"
WHERE
"Post"."createdAt" > '2015-08-19 14:55:50.398'
AND "Post"."new" = TRUE;
Index:
CREATE INDEX posts_new_createdat_idx ON "Posts" ("createdAt")
WHERE
NEW = TRUE
Plan:
Aggregate (cost=389915.59..389915.60 rows=1 width=0) (actual time=4234.772..4234.773 rows=1 loops=1)
Buffers: shared hit=254427
-> Bitmap Heap Scan on "Posts" "Post" (cost=14415.81..387990.63 rows=769985 width=0) (actual time=123.805..3859.150 rows=1138854 loops=1)
Recheck Cond: (("createdAt" > '2015-08-19 14:55:50.398+00'::timestamp with time zone) AND new)
Rows Removed by Index Recheck: 8238790
Buffers: shared hit=254427
-> Bitmap Index Scan on posts_new_createdat_idx (cost=0.00..14223.32 rows=769985 width=0) (actual time=122.601..122.601 rows=1138854 loops=1)
Index Cond: ("createdAt" > '2015-08-19 14:55:50.398+00'::timestamp with time zone)
Buffers: shared hit=3114
Total runtime: 4234.989 ms
Schema:
CREATE TABLE "public"."Posts" (
"id" int4 NOT NULL DEFAULT nextval('"Posts_id_seq"'::regclass),
"actionId" int4,
"commentCount" int4 DEFAULT 0,
"facebook" bool,
"featurePostOnDate" timestamp(6) WITH TIME ZONE,
"forcedPrivate" bool,
"instagram" bool,
"isReported" bool,
"likeCount" int4 DEFAULT 0,
"note" text COLLATE "default",
"photo" varchar(255) COLLATE "default",
"private" bool,
"new" bool,
"popular" bool,
"twitter" bool,
"userId" int4,
"objectId" varchar(255) COLLATE "default",
"createdAt" timestamp(6) WITH TIME ZONE,
"updatedAt" timestamp(6) WITH TIME ZONE,
"activityLogId" int4,
"weightLogId" int4,
"workoutId" int4,
"workoutLogId" int4,
"thumbnail" varchar(255) COLLATE "default"
)
Data:
new = true
for 99% or records- Any posts older than 2 weeks can be ignored (from the count and the index)
Database Details:
Host = Amazon AWS
Engine = PostgreSQL 9.3.10
Instance Class = db.r3.8xlarge
Storage Type = SSD
IOPS = 3000
StorageAmount = 500 GB
As you can see, the partial index is very large. Is there a way to better index so that the recheck condition is not so heavy?
Best Answer
Add the second predicate of your query to the partial index as well:
Your timestamp is probably a moving target, but I am going to assume you have lots of old rows that are excluded in most of your queries and only few "younger" rows are of interest. A typical use case. You can cut off old rows in your partial index.
This only makes sense if you can reduce the size of the index to a fraction. Else, the effect is typically not worth the trouble.
Replace with an opportune timestamp to cut off as many rows as possible while still covering relevant rows for your queries.
Modern versions of Postgres are smart enough to understand the index is applicable to queries cutting off at a later timestamp. For older versions you have to add the verbatim index condition redundantly to make the query planner use this partial index.
This index will be used by your query immediately as is. (I just removed some more noise):
Related:
Add datetime constraint to a PostgreSQL multi-column partial index
"Recheck Cond:" line in query plans with a bitmap index scan
Aside 1:
"new" = TRUE
is just a noisy way of stating"new"
for a boolean value.Aside 2: Don't use reserved words or CaMeL case or other illegal identifiers so you don't have to double-quote all the time. Very error prone.