Postgresql – Optimize Postgres query where a time range as well as IN values

aws-auroraoptimizationperformancepostgresqlpostgresql-9.6postgresql-performance

I have a table that gets around 20 million new records a day for positions of vessels. I want to query for a couple of different vessels and get the records for any period. As I'm using this for online queries, I would like the average response time to be less than 1 second, but I can live with a couple of seconds. ?

When querying for a single vessel, I get a decent query response time, but as soon as I start adding in multiple vessels, the query becomes slow. I have a general timeout whenever a query takes more than 30 seconds, and it's around 4 minutes to respond.

I read Combining Multiple Indexes in the Postgres documentation and from what I understand it's not optimal to be doing a lookup over multiple values because it practically has to recheck each value against the index?

The system can form AND and OR conditions across several index scans. For example, a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be broken down into four separate scans of an index on x, each scan using one of the query clauses. The results of these scans are then ORed together to produce the result.

Which makes me wonder, am I ever likely to succeed in making this as fast as I would like? Should I be looking at a different DB or table layout to be able to field these kinds of requests?

The table layout with its indexes, this table is also partitioned which is why the query plan looks a bit odd:

create table mmsi_positions_archive
(
    id bigserial not null
        constraint mmsi_positions_archive_pkey
            primary key,
    position_id uuid,
    previous_id uuid,
    mmsi bigint not null,
    collection_type varchar not null,
    accuracy numeric,
    maneuver numeric,
    rate_of_turn numeric,
    status integer,
    speed numeric,
    course numeric,
    heading numeric,
    position geometry(Point,4326),
    timestamp timestamp with time zone not null,
    updated_at timestamp with time zone default now(),
    created_at timestamp with time zone default now()
);

create index ix_mmsi_positions_archive_position_id
    on mmsi_positions_archive (position_id);

create index ix_mmsi_positions_archive_timestamp_id
    on mmsi_positions_archive (timestamp, id);

create index ix_mmsi_positions_archive_mmsi_timestamp_id
    on mmsi_positions_archive (mmsi, timestamp, id);

Using this query to get a batch of positions for a single MMSI:

   SELECT mmsi_positions_archive.id              AS mmsi_positions_archive_id,
           mmsi_positions_archive.position_id     AS mmsi_positions_archive_position_id,
           mmsi_positions_archive.previous_id     AS mmsi_positions_archive_previous_id,
           mmsi_positions_archive.mmsi            AS mmsi_positions_archive_mmsi,
           mmsi_positions_archive.collection_type AS mmsi_positions_archive_collection_type,
           mmsi_positions_archive.accuracy        AS mmsi_positions_archive_accuracy,
           mmsi_positions_archive.course          AS mmsi_positions_archive_course,
           mmsi_positions_archive.heading         AS mmsi_positions_archive_heading,
           mmsi_positions_archive.maneuver        AS mmsi_positions_archive_maneuver,
           mmsi_positions_archive.rate_of_turn    AS mmsi_positions_archive_rate_of_turn,
           mmsi_positions_archive.speed           AS mmsi_positions_archive_speed,
           mmsi_positions_archive.status          AS mmsi_positions_archive_status,
           concat(
                   ST_X(mmsi_positions_archive.position),
                   ';',
                   ST_Y(mmsi_positions_archive.position)
               )                                  AS mmsi_positions_archive_position,
           mmsi_positions_archive.timestamp       AS mmsi_positions_archive_timestamp,
           mmsi_positions_archive.created_at      AS mmsi_positions_archive_created_at,
           vessels_v2.vessel_id                   AS vessel_id
    FROM (
             SELECT *
             FROM mmsi_positions_archive
             WHERE mmsi_positions_archive.timestamp >= '2019-01-01 00:00:00+00:00'
               AND mmsi IN (SELECT vessels_v2.mmsi
                            FROM vessels_v2
                            WHERE vessels_v2.vessel_id IN ('c2cba910-c7fa-4efe-9531-1cd30521e6e0'))
               AND mmsi_positions_archive.timestamp <= '2019-01-14 00:00:00+00:00'
             ORDER BY mmsi_positions_archive.timestamp ASC, mmsi_positions_archive.id ASC
             LIMIT 1000
         ) AS mmsi_positions_archive
             JOIN vessels_v2 ON vessels_v2.mmsi = mmsi_positions_archive.mmsi
    ORDER BY mmsi_positions_archive.timestamp ASC, mmsi_positions_archive.id ASC;

Gives the following query plan:

Nested Loop  (cost=2834.44..7350.62 rows=580 width=150) (actual time=1772.151..1777.071 rows=1000 loops=1)
  ->  Limit  (cost=2834.02..2835.47 rows=580 width=142) (actual time=1772.114..1772.286 rows=1000 loops=1)
        ->  Sort  (cost=2834.02..2835.47 rows=580 width=142) (actual time=1772.113..1772.218 rows=1000 loops=1)
"              Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
              Sort Method: quicksort  Memory: 370kB
              ->  Nested Loop  (cost=0.42..2807.40 rows=580 width=142) (actual time=0.032..1769.592 rows=1209 loops=1)
                    ->  Index Scan using ix_vessels_v2_vessel_id on vessels_v2 vessels_v2_1  (cost=0.42..8.44 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
                          Index Cond: (vessel_id = 'c2cba910-c7fa-4efe-9531-1cd30521e6e0'::uuid)
                    ->  Append  (cost=0.00..2792.05 rows=691 width=134) (actual time=0.020..1768.818 rows=1209 loops=1)
                          ->  Seq Scan on mmsi_positions_archive  (cost=0.00..0.00 rows=1 width=324) (actual time=0.001..0.001 rows=0 loops=1)
                                Filter: (("timestamp" >= '2019-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2019-01-14 00:00:00+00'::timestamp with time zone) AND (vessels_v2_1.mmsi = mmsi))
                          ->  Index Scan using mmsi_positions_archive_p2019q1_mmsi_timestamp_id_idx on mmsi_positions_archive_p2019q1  (cost=0.70..2792.05 rows=690 width=134) (actual time=0.017..1768.526 rows=1209 loops=1)
                                Index Cond: ((mmsi = vessels_v2_1.mmsi) AND ("timestamp" >= '2019-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2019-01-14 00:00:00+00'::timestamp with time zone))
  ->  Index Scan using ix_vessels_v2_mmsi on vessels_v2  (cost=0.42..7.76 rows=1 width=24) (actual time=0.001..0.001 rows=1 loops=1000)
        Index Cond: (mmsi = mmsi_positions_archive.mmsi)
Planning time: 1.740 ms
Execution time: 1777.251 ms

Not the fastest response, but I'm okay with it.

But if I add on 99 more vessel_id to lookup against (I also tried with both EXISTS and IN, and it's the same query plan) it runs a lot slower but looks pretty much the same. I guess it's roughly linear:

Nested Loop  (cost=283506.49..290868.07 rows=1000 width=150) (actual time=242910.968..242916.242 rows=1000 loops=1)
  ->  Limit  (cost=283506.07..283508.57 rows=1000 width=142) (actual time=242910.928..242911.198 rows=1000 loops=1)
        ->  Sort  (cost=283506.07..283651.18 rows=58045 width=142) (actual time=242910.927..242911.132 rows=1000 loops=1)
"              Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
              Sort Method: top-N heapsort  Memory: 314kB
              ->  Nested Loop  (cost=810.44..280323.53 rows=58045 width=142) (actual time=1.466..242642.459 rows=198040 loops=1)
                    ->  HashAggregate  (cost=810.44..811.44 rows=100 width=8) (actual time=0.457..0.596 rows=100 loops=1)
                          Group Key: vessels_v2_1.mmsi
                          ->  Bitmap Heap Scan on vessels_v2 vessels_v2_1  (cost=439.02..810.19 rows=100 width=8) (actual time=0.274..0.443 rows=100 loops=1)
"                                Recheck Cond: (vessel_id = ANY ('{c2cba910-c7fa-4efe-9531-1cd30521e6e0,e7f63085-0c72-5bbb-ba3f-e8ea77ed8f9a,6d42935b-e709-4891-ad0d-9f7a5b44b6b9,88a29bc8-9068-47ab-8508-52a64c8f57f2,e0ae6213-289e-59a8-8f13-1463b83f6937,58ffb912-0d7a-54a4-b105-9b97dcc80363,c18c0067-059f-47d3-8ab2-39ce599b4442,e687fae4-795a-5f96-96fe-181084285a5d,e1c3b3c1-8a56-4d4e-96ac-55e459241538,85313219-ba1f-4d59-b852-46f681faddd0,41832283-52f9-4a62-a8c8-cc9241c6d463,008320d4-7560-47f2-97e1-6f558aa3425d,b9e289f4-5048-47db-9ad6-d856be1c13ef,bbf7dfb1-fe61-4587-a0ba-1c88505fd820,3bf79bc9-5707-4aeb-8225-c2ca58a6b2c1,20b658d2-cf7d-4055-84d0-b613141c2e50,7dbeecfc-99f1-4584-83f3-9436986c8df0,869cada5-c9b8-4a02-8390-acfd13f26ca3,7e27d598-a8d6-49fc-81d2-222dd4b86287,41442ada-e4e6-4886-9356-261f7e9af03f,7d594b52-bc55-472f-ba4c-aa15029e2997,238494ef-1059-585e-8a86-ff49a85e0245,af9752e8-1436-4f5b-91da-09d9a296c865,245dd437-4873-4c43-9c12-1f50198e9572,8b5d47ac-ada9-41b6-82d0-f57e9fafad36,4114283a-a00b-5b48-a764-13e15f5ffe0c,188b212e-0834-47ee-bb86-20bb7358891f,a56ba9c7-4215-4717-93c8-410d3bb9921f,e6f4680b-6cd8-4331-9025-8f8ecd35c77c,90027158-73aa-4a52-9201-fb314f59b812,ab1a1e1f-2c33-4e99-a0bc-d4957f13c36f,812e80d7-2d48-4f2d-be65-8bf73577712d,90a48902-e646-490a-8d75-3e2d94f64ffb,26447406-6057-49be-809c-3d11897c17e6,041126f0-c9a0-4b4d-8cf1-4b9a05fa1fbb,b1e0c45d-50ec-4435-bc29-5c428b445e8f,9add606d-68fd-57a0-b9e1-21139d0eef12,9af608f1-21c2-44b7-866a-496f9d707fb3,30b1a71e-949e-43ff-84b6-8457ed71b387,31385944-0c19-4e6f-afe9-dda7ad37a10b,9a9abe28-c5e1-4df2-8d90-7f4a137d4d50,f97ba534-847d-4f4f-a40b-fc0ea7a6291a,c86b0d7a-53ed-40b9-b15d-cd8b21a5b467,7ef2abef-0db5-47f1-a984-e96132401534,f4474469-f17b-414c-8b81-79e9623e66f1,85686cd5-86b4-58ce-9d63-ca8cdc74fdf0,4ac608f5-d23a-4fb2-8137-530d52be8613,38c59a1a-8c5d-5517-83e7-17010a624cc8,8217b553-0c3a-43dc-ac91-660acd350597,cfef6384-41af-47ec-8438-90bc2a2bac34,beafb43c-6af8-476a-b51c-7aa9c4a62d14,380bf1b1-4424-4606-a186-3cdb09693dc6,2c904fd8-68b8-4531-8dfb-8cd0b9dbb026,8e24703d-3f99-5a85-8489-f189aac7b4ef,8f4e90d0-fa33-4a74-a842-7922cde86edd,c4a77fa0-fc82-5335-93bb-2813c85c0ab7,128da36f-2494-412f-b669-0d70fb558065,eeed333b-6d71-4401-b5e4-a80e6224850e,b87a2d57-f1fc-4209-bacf-9dc1edffd121,89e55083-d6d1-4eab-801e-75b358a3beff,61850f9d-259f-40b5-a35b-64c3dfbd9e46,19806582-2966-41d0-b5ed-42849adc4ac7,ab0e9346-6c70-412f-aba7-cac2e6e8740b,83140b9d-54e7-4c6e-b1f2-30fedf0750ad,aaf239ef-09a0-4807-b889-72b6014e0f8b,682077b3-cc2c-5881-bab1-2249e72aa09b,4f1cd57c-470e-44a8-83f2-958f49d22be5,4c9d1a1a-b0e8-503b-b670-9c209bcc0cb7,7d050be6-b922-4a5c-88dc-4417e17f5da2,25addffa-0375-5065-ad7e-27461218f92b,f0e00d3c-37f7-4801-b384-a30931d64dee,861214ae-a0fb-5b57-a0ba-11bb29d674e5,357fb0ff-b21c-4c62-812d-2360ea326044,bf91f821-aa01-4f7e-9b46-62950dc1ef4b,442d6006-c407-454c-95ca-b23a9b2aaeea,18703f60-49d9-53fa-87bc-bd4e5e80a888,08511089-ce83-58bf-badd-4bd8dbb67213,5c311c46-9996-4bab-ad31-9d4f5a75be7d,5b1cacd9-e221-4892-bc6e-855840ed270a,58142628-6a6f-4f6d-8a19-bffc7430fe48,d7b6f3d9-87f9-484a-84e8-b4a3cfb3b450,01f09b54-deca-400d-820b-7b099ba07376,ec6690a2-c4fb-4dfe-8d1b-6e47341d893a,7473a9f5-a5df-4aea-b0dd-1942f4fab384,46015f3e-1ef3-4ec6-a801-19a3d136fab6,52014875-a316-469b-99dc-e8ec7a3812f1,2c467523-3623-48db-a791-0ee89d340c40,6fd426a1-dae0-48ca-bd80-d32c2feb94ca,a02739f1-62b5-4af4-8acc-78728cdb9ff8,ed20ffd0-dd17-4a7b-935a-446debf736a2,f8743fee-2129-4776-8857-3cc34feac9a2,514d19d9-8900-5021-b373-94e14f96b7f3,32f5295e-efe4-44f9-82e1-8fe9c643925a,e156963e-344f-4e56-ba7e-b40be6754660,a37b5dd4-2086-4f05-b297-c60b46d0ddc3,1eda7fb1-21dc-4a7e-82e1-aa8c5308d240,a3585721-8b2c-4e27-b077-64526eec1136,89a30ede-8b8e-5d62-9da0-365cd059fcda,688dc661-e54c-4585-bedf-8817a5aa3aad,5aa5f47f-379d-48ff-b122-b13561d6e1ea}'::uuid[]))"
                                Heap Blocks: exact=99
                                ->  Bitmap Index Scan on ix_vessels_v2_vessel_id  (cost=0.00..439.00 rows=100 width=0) (actual time=0.263..0.263 rows=102 loops=1)
"                                      Index Cond: (vessel_id = ANY ('{c2cba910-c7fa-4efe-9531-1cd30521e6e0,e7f63085-0c72-5bbb-ba3f-e8ea77ed8f9a,6d42935b-e709-4891-ad0d-9f7a5b44b6b9,88a29bc8-9068-47ab-8508-52a64c8f57f2,e0ae6213-289e-59a8-8f13-1463b83f6937,58ffb912-0d7a-54a4-b105-9b97dcc80363,c18c0067-059f-47d3-8ab2-39ce599b4442,e687fae4-795a-5f96-96fe-181084285a5d,e1c3b3c1-8a56-4d4e-96ac-55e459241538,85313219-ba1f-4d59-b852-46f681faddd0,41832283-52f9-4a62-a8c8-cc9241c6d463,008320d4-7560-47f2-97e1-6f558aa3425d,b9e289f4-5048-47db-9ad6-d856be1c13ef,bbf7dfb1-fe61-4587-a0ba-1c88505fd820,3bf79bc9-5707-4aeb-8225-c2ca58a6b2c1,20b658d2-cf7d-4055-84d0-b613141c2e50,7dbeecfc-99f1-4584-83f3-9436986c8df0,869cada5-c9b8-4a02-8390-acfd13f26ca3,7e27d598-a8d6-49fc-81d2-222dd4b86287,41442ada-e4e6-4886-9356-261f7e9af03f,7d594b52-bc55-472f-ba4c-aa15029e2997,238494ef-1059-585e-8a86-ff49a85e0245,af9752e8-1436-4f5b-91da-09d9a296c865,245dd437-4873-4c43-9c12-1f50198e9572,8b5d47ac-ada9-41b6-82d0-f57e9fafad36,4114283a-a00b-5b48-a764-13e15f5ffe0c,188b212e-0834-47ee-bb86-20bb7358891f,a56ba9c7-4215-4717-93c8-410d3bb9921f,e6f4680b-6cd8-4331-9025-8f8ecd35c77c,90027158-73aa-4a52-9201-fb314f59b812,ab1a1e1f-2c33-4e99-a0bc-d4957f13c36f,812e80d7-2d48-4f2d-be65-8bf73577712d,90a48902-e646-490a-8d75-3e2d94f64ffb,26447406-6057-49be-809c-3d11897c17e6,041126f0-c9a0-4b4d-8cf1-4b9a05fa1fbb,b1e0c45d-50ec-4435-bc29-5c428b445e8f,9add606d-68fd-57a0-b9e1-21139d0eef12,9af608f1-21c2-44b7-866a-496f9d707fb3,30b1a71e-949e-43ff-84b6-8457ed71b387,31385944-0c19-4e6f-afe9-dda7ad37a10b,9a9abe28-c5e1-4df2-8d90-7f4a137d4d50,f97ba534-847d-4f4f-a40b-fc0ea7a6291a,c86b0d7a-53ed-40b9-b15d-cd8b21a5b467,7ef2abef-0db5-47f1-a984-e96132401534,f4474469-f17b-414c-8b81-79e9623e66f1,85686cd5-86b4-58ce-9d63-ca8cdc74fdf0,4ac608f5-d23a-4fb2-8137-530d52be8613,38c59a1a-8c5d-5517-83e7-17010a624cc8,8217b553-0c3a-43dc-ac91-660acd350597,cfef6384-41af-47ec-8438-90bc2a2bac34,beafb43c-6af8-476a-b51c-7aa9c4a62d14,380bf1b1-4424-4606-a186-3cdb09693dc6,2c904fd8-68b8-4531-8dfb-8cd0b9dbb026,8e24703d-3f99-5a85-8489-f189aac7b4ef,8f4e90d0-fa33-4a74-a842-7922cde86edd,c4a77fa0-fc82-5335-93bb-2813c85c0ab7,128da36f-2494-412f-b669-0d70fb558065,eeed333b-6d71-4401-b5e4-a80e6224850e,b87a2d57-f1fc-4209-bacf-9dc1edffd121,89e55083-d6d1-4eab-801e-75b358a3beff,61850f9d-259f-40b5-a35b-64c3dfbd9e46,19806582-2966-41d0-b5ed-42849adc4ac7,ab0e9346-6c70-412f-aba7-cac2e6e8740b,83140b9d-54e7-4c6e-b1f2-30fedf0750ad,aaf239ef-09a0-4807-b889-72b6014e0f8b,682077b3-cc2c-5881-bab1-2249e72aa09b,4f1cd57c-470e-44a8-83f2-958f49d22be5,4c9d1a1a-b0e8-503b-b670-9c209bcc0cb7,7d050be6-b922-4a5c-88dc-4417e17f5da2,25addffa-0375-5065-ad7e-27461218f92b,f0e00d3c-37f7-4801-b384-a30931d64dee,861214ae-a0fb-5b57-a0ba-11bb29d674e5,357fb0ff-b21c-4c62-812d-2360ea326044,bf91f821-aa01-4f7e-9b46-62950dc1ef4b,442d6006-c407-454c-95ca-b23a9b2aaeea,18703f60-49d9-53fa-87bc-bd4e5e80a888,08511089-ce83-58bf-badd-4bd8dbb67213,5c311c46-9996-4bab-ad31-9d4f5a75be7d,5b1cacd9-e221-4892-bc6e-855840ed270a,58142628-6a6f-4f6d-8a19-bffc7430fe48,d7b6f3d9-87f9-484a-84e8-b4a3cfb3b450,01f09b54-deca-400d-820b-7b099ba07376,ec6690a2-c4fb-4dfe-8d1b-6e47341d893a,7473a9f5-a5df-4aea-b0dd-1942f4fab384,46015f3e-1ef3-4ec6-a801-19a3d136fab6,52014875-a316-469b-99dc-e8ec7a3812f1,2c467523-3623-48db-a791-0ee89d340c40,6fd426a1-dae0-48ca-bd80-d32c2feb94ca,a02739f1-62b5-4af4-8acc-78728cdb9ff8,ed20ffd0-dd17-4a7b-935a-446debf736a2,f8743fee-2129-4776-8857-3cc34feac9a2,514d19d9-8900-5021-b373-94e14f96b7f3,32f5295e-efe4-44f9-82e1-8fe9c643925a,e156963e-344f-4e56-ba7e-b40be6754660,a37b5dd4-2086-4f05-b297-c60b46d0ddc3,1eda7fb1-21dc-4a7e-82e1-aa8c5308d240,a3585721-8b2c-4e27-b077-64526eec1136,89a30ede-8b8e-5d62-9da0-365cd059fcda,688dc661-e54c-4585-bedf-8817a5aa3aad,5aa5f47f-379d-48ff-b122-b13561d6e1ea}'::uuid[]))"
                    ->  Append  (cost=0.00..2788.21 rows=691 width=134) (actual time=0.733..2425.501 rows=1980 loops=100)
                          ->  Seq Scan on mmsi_positions_archive  (cost=0.00..0.00 rows=1 width=324) (actual time=0.001..0.001 rows=0 loops=100)
                                Filter: (("timestamp" >= '2019-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2019-01-14 00:00:00+00'::timestamp with time zone) AND (vessels_v2_1.mmsi = mmsi))
                          ->  Index Scan using mmsi_positions_archive_p2019q1_mmsi_timestamp_id_idx on mmsi_positions_archive_p2019q1  (cost=0.70..2788.21 rows=690 width=134) (actual time=0.729..2425.096 rows=1980 loops=100)
                                Index Cond: ((mmsi = vessels_v2_1.mmsi) AND ("timestamp" >= '2019-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2019-01-14 00:00:00+00'::timestamp with time zone))
  ->  Index Scan using ix_vessels_v2_mmsi on vessels_v2  (cost=0.42..7.33 rows=1 width=24) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (mmsi = mmsi_positions_archive.mmsi)
Planning time: 1.148 ms
Execution time: 242916.404 ms

Best Answer

The index scan also takes a long time with only one vessel_id:

->  Index Scan using mmsi_positions_archive_p2019q1_mmsi_timestamp_id_idx on mmsi_positions_archive_p2019q1  (cost=0.70..2792.05 rows=690 width=134) (actual time=0.017..1768.526 rows=1209 loops=1)
      Index Cond: ((mmsi = vessels_v2_1.mmsi) AND ("timestamp" >= '2019-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2019-01-14 00:00:00+00'::timestamp with time zone))

1.8 seconds to scan for 690 rows! It would be interesting to see EXPLAIN (ANALYZE, BUFFERS) output so that we can tell if it is just low I/O or if there is another problem.

The way you wrote the query, the query with the 100 vessel_ids has to perform a nested loop, so it has to perform the slow index scan 100 times, which is where all the time is spent.

Rewrite the subquery to

SELECT *
FROM mmsi_positions_archive
WHERE mmsi_positions_archive.timestamp
      BETWEEN '2019-01-01 00:00:00+00:00' AND '2019-01-14 00:00:00+00:00'
  AND EXISTS (
         SELECT 1 FROM vessels_v2
         WHERE vessels_v2.mmsi = mmsi_positions_archive.mmsi
           AND vessels_v2.vessel_id IN (/* long list */)
      )

Different from IN, EXISTS clause can be transformed to a semi-join, which might result in a better execution plan.