Postgresql improve select performance/parallelism in huge table

performanceperformance-tuningpostgresqlpostgresql-9.5query-performance

I have a huge table (> 3000 million rows, totaling 3 TB) using Postgres 9.5 defined as follows:

CREATE TABLE user_events
(
    user_id VARCHAR NOT NULL,
    datetime BIGINT NOT NULL,
    field1 VARCHAR,
    field2 VARCHAR,
    field3 VARCHAR,
    field4 VARCHAR,
    field5 VARCHAR,
    CONSTRAINT user_events_user_id_datetime_pk PRIMARY KEY (user_id, datetime)
);

alter table user_events set (autovacuum_vacuum_cost_delay=10);

I need to query this table roughly 2000 times per second but I'm not being able to parallelize enough (I'm reaching approx half that).

Using the following prepared statement, I get a reasonable (I suppose) response time:

PREPARE getact AS select * from user_events where user_id = $1 order by datetime limit 1000;

explain analyze EXECUTE getact('xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx');
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..1495.10 rows=367 width=878) (actual time=0.028..0.044 rows=4 loops=1)
   ->  Index Scan using user_events_user_id_datetime_pk on user_events  (cost=0.70..1495.10 rows=367 width=878) (actual time=0.026..0.034 rows=4 loops=1)
         Index Cond: ((user_id)::text = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'::text)
 Execution time: 0.073 ms

Some additional notes:

  • The content wasn't loaded in an orderly manner so datetime won't be
    physically ordered increasingly but from now on it will.
  • The table is mostly append only
  • We expect the table to grow at most a %50 more in the long term.

At this point I'm looking for any configuration suggestion. I know I had scaling options like a readonly replica or sharding, but would like to avoid that as much as possible for now.

Would executing CLUSTER work in this case? It would seems like the most part of the cost is in the index scanning, so I'd guess it won't, but I don't know (and testing it is really expensive).

Some more information:

  • There are multiple threads writing (mostly appending) at the same rate that needs to be read
  • Though the times described with the prepared statement are real, I've seen results ranging from < 1 ms to > 100ms

EXPLAIN (ANALYZE, BUFFERS) (slowest) result

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM user_events WHERE user_id = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX';
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using user_events_user_id_datetime_pk on user_events  (cost=0.70..1495.10 rows=367 width=878) (actual time=3.336..1162.936 rows=1518 loops=1)
   Index Cond: ((user_id)::text = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'::text)
   Buffers: shared hit=5 read=1532
 Planning time: 0.071 ms
 Execution time: 1164.926 ms
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM user_events WHERE user_id = 'yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy';
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using user_events_user_id_datetime_pk on user_events  (cost=0.70..1495.10 rows=367 width=878) (actual time=18.890..953.809 rows=1269 loops=1)
   Index Cond: ((user_id)::text = 'yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy'::text)
   Buffers: shared hit=5 read=1282 dirtied=107
 Planning time: 0.105 ms
 Execution time: 955.590 ms
(5 rows)

Best Answer

Although I didn't reach an answer for the issue (due to time constraints), I think that the most useful tip to remark is:

Use EXPLAIN (ANALYZE, BUFFERS) ... on many queries (with different values) to try to understand what is happening. In my case, this put back CLUSTER as a possible optimization option.

What I would do if I had the time to try to optimize the table:

  1. Change the indexed values for the smallest possible. In my case, that would mean changing the user_id from VARCHARto UUID.
  2. Execute CLUSTER to reduce the number of pages fetched.