Postgresql – What SQL execution time should I expect when querying 4k records in a 150M records table

postgresqlpostgresql-11query-performance

We currently have performance issues with a SQL query on RDS Postgres 11.8. Here is the query:

SELECT id FROM answer_submission WHERE student_id = 3983137;

The table "answer_submission" has about 150M records. The query returns about 4k ids.

The query takes ~2s to execute which is problematic for the user experience in our app. (Note that this query is for debug purpose. The real query select most of the columns and thus adding all these columns in the index would make the index heavy)

An analysis of the query confirms that the B-TREE index that we created is correctly used:

Index Scan using answer_submission_student_id on public.answer_submission (cost=0.57..2505.35 rows=842 width=16) (actual time=2.356..2346.005 rows=4604 loops=1)
Output: id
Index Cond: (answer_submission.student_id = 3983137)
Buffers: shared hit=3 read=4392 written=1118
I/O Timings: read=2298.117 write=16.805
Planning Time: 2.370 ms
Execution Time: 2349.484 ms

I understand from this analysis that the bottleneck is the time to read on the disk.

I don't have a critical opinion about this response time. Is it in the range of time that one should expect for querying 4k records in a 150M records table? Or is there an issue somewhere? Do you have any clues on how to improve the execution time?

I already read this question Slow index scans in large table but before implementing this kind of solutions I would like to make sure I can't fix the root cause of this long execution time.

Our server has 64GB of RAM. The size of the index is ~4Gb and we have other tables with similar index sizes.

This behavior is not an occasional behavior. It can be consistently reproduced at any time. Also there are many records saved at any time in this table.

Thanks for you help

Best Answer

It looks like you were measuring the query right after importing a lot of data into the table, so your query has to set hint bits, or like shared_buffers is so small that the query finds no free buffers and has to write out some (written=1118).

Try an index that includes id so you can get an index only scan and VACUUM the table:

CREATE INDEX ON answer_submission (student_id) INCLUDE (id);
VACUUM answer_submission;

That should make the query fast.

Related Solutions

MySQL: What points should we consider when create composite index

But, how we will decide the ordering of columns if we have many columns and many different SQL queries.

More selective columns should go first.

What are the other points which should be focused when create composite index?

Not necessary all fields on an index may be used. For example, if you have index (a, b, c) and WHERE clause looks like

WHERE a = 5 and b < 10 and c = 6

then only a and b fields may be used. A field with < or > will be the last one that's used.

What is the role of cardinality and modality of a column in creating index and composite indices?

Cardinality of a set is number of element in the set. The higher cardinality of a field values set the closer the fields should come in the composite index.

SQL Server Query Performance – Optimizing Group By with Many Columns

The non-clustered index you have tested is not the best for this query. It can be used for the WHERE clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY.

The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE clause), then have all the columns used in the GROUP BY and then INCLUDE all the other columns used in the SELECT:

CREATE INDEX special_ix 
  ON dbo.Commissions_Output
    ( company, location, account, 
      salesroute, employee, producttype, 
      item, loadjdate, commissionrate ) 
INCLUDE 
  ( [Extended Sales Price], [Delivered Qty] ) 
WHERE 
  ( [Extended Sales Price] <> 0 ) ;

Best Answer

Related Solutions

MySQL: What points should we consider when create composite index

SQL Server Query Performance – Optimizing Group By with Many Columns

Related Question