Postgresql – Table with huge data takes long time to query

performancepostgresqlpostgresql-performance

I have a table in Postgres that is having over 50million records in production. It’s a table that I have to query it often for debugging and data extraction because it is heavily used by the application.

However, due to the massive amount of data, the table takes a long time for every query to complete. Even for simple queries that just sort the updated_dt column without any other conditions can take a long time while queries with conditions can take 15-30mins to query! It doesn’t help even when I add a limit 1000 the results and indexed the updated_dt column.

This makes the whole table almost unable to be queried by any kind of statements. My Postgres database is hosted on AWS RDS on an already pretty big instance.

Here's the explain statements:

SELECT "id", "store_id", ... FROM inventory ORDER BY "updated_dt" DESC, "id" DESC  LIMIT 1000;
Limit  (cost=9737754.79..9737757.29 rows=1000 width=1710) (actual time=328054.666..328054.790 rows=1000 loops=1)
  Buffers: shared hit=976936 read=6371799
  ->  Sort  (cost=9737754.79..9829882.64 rows=36851142 width=1710) (actual time=328054.665..328054.711 rows=1000 loops=1)
        Sort Key: updated_dt DESC, id DESC
        Sort Method: top-N heapsort  Memory: 3401kB
        Buffers: shared hit=976936 read=6371799
        ->  Seq Scan on "inventory"  (cost=0.00..7717246.42 rows=36851142 width=1710) (actual time=0.011..291962.272 rows=36813941 loops=1)
              Buffers: shared hit=976936 read=6371799
Planning time: 0.078 ms
Execution time: 328055.138 ms

Here are the indexes applied to the table:

CREATE UNIQUE INDEX "inventory_pkey" ON "inventory"(id text_ops);
CREATE INDEX inventory_store_id ON "inventory"(inventory_store_id text_ops);
CREATE INDEX inventory_item_data ON "inventory" USING GIN (item_data jsonb_path_ops);
CREATE INDEX inventory_created_dt ON "inventory"(created_dt timestamptz_ops);
CREATE INDEX inventory_updated_dt ON "inventory"(updated_dt timestamptz_ops);

What can I do make this table which has a huge number of records run faster when being queried?

Best Answer

Your complaint here is rather inconsistent. You talk about simple queries that just sort on updated_dt, but that is not the query you show. Also, the query you did show did not take 15-30 minutes. Is there another even simpler query that takes even longer? If so, that would server as a better example. Particularly if you turn on track_io_timing first.

The query you show orders by both updated_dt and id. While conceptually an index on just updated_dt could speed that up by requiring a sort on id only within ties on updated_dt, this is not implemented in any current version of PostgreSQL. If you want an index to serve your multi-column ORDER BY, you must include both columns in the index.

If you want to make most things faster without building new indexes for all of them, you might want to get either an even larger instance, or provision faster storage.