Postgresql – What SQL execution time should I expect when querying 4k records in a 150M records table

postgresqlpostgresql-11query-performance

We currently have performance issues with a SQL query on RDS Postgres 11.8. Here is the query:

SELECT id FROM answer_submission WHERE student_id = 3983137;

The table "answer_submission" has about 150M records. The query returns about 4k ids.

The query takes ~2s to execute which is problematic for the user experience in our app. (Note that this query is for debug purpose. The real query select most of the columns and thus adding all these columns in the index would make the index heavy)

An analysis of the query confirms that the B-TREE index that we created is correctly used:

Index Scan using answer_submission_student_id on public.answer_submission (cost=0.57..2505.35 rows=842 width=16) (actual time=2.356..2346.005 rows=4604 loops=1)
Output: id
Index Cond: (answer_submission.student_id = 3983137)
Buffers: shared hit=3 read=4392 written=1118
I/O Timings: read=2298.117 write=16.805
Planning Time: 2.370 ms
Execution Time: 2349.484 ms

I understand from this analysis that the bottleneck is the time to read on the disk.

I don't have a critical opinion about this response time. Is it in the range of time that one should expect for querying 4k records in a 150M records table? Or is there an issue somewhere? Do you have any clues on how to improve the execution time?

I already read this question Slow index scans in large table but before implementing this kind of solutions I would like to make sure I can't fix the root cause of this long execution time.

Our server has 64GB of RAM. The size of the index is ~4Gb and we have other tables with similar index sizes.

This behavior is not an occasional behavior. It can be consistently reproduced at any time. Also there are many records saved at any time in this table.

Thanks for you help

Best Answer

It looks like you were measuring the query right after importing a lot of data into the table, so your query has to set hint bits, or like shared_buffers is so small that the query finds no free buffers and has to write out some (written=1118).

Try an index that includes id so you can get an index only scan and VACUUM the table:

CREATE INDEX ON answer_submission (student_id) INCLUDE (id);
VACUUM answer_submission;

That should make the query fast.