Postgresql query is very slow for the big table

performancepostgresqlpostgresql-9.5query-performancevmware

My database version is postgresql 9.5.

create table if not exists request_log
(
    id               bigserial not null constraint app_requests_pkey  primary key,
    request_date     timestamp not null,
    ip               varchar(50),
    start_time       timestamp,
    application_name varchar(200),
    request_path     text,
    display_url      text,
    username         varchar(50)
);

I have a table that icludes incoming http request informations. The id column is primary key and index. The table has no relation.

So I have 72320081 rows in this table. And when I run the count query to get count of table, select count(id) from request_log; query takes 3-5 minutes.

The explain(analyze, buffers, format text) result for this request is:

Aggregate  (cost=3447214.71..3447214.72 rows=1 width=0) (actual time=135575.947..135575.947 rows=1 loops=1)
  Buffers: shared hit=96 read=2551303
  ->  Seq Scan on request_log  (cost=0.00..3268051.57 rows=71665257 width=0) (actual time=2.517..129032.408 rows=72320081 loops=1)
        Buffers: shared hit=96 read=2551303
Planning time: 0.067 ms
Execution time: 135575.988 ms

This is very bad performance for me. I could not get reports from table from web applications because of performance.

My server hardware sources are:

  • OS: Linux ubuntu server 16, On Vmware
  • 4 core cpu
  • Mem 6Gb
  • HDD 120 Gb

I run the queries at nights, that there are no users on database, but slow. How can solve this problem?

Best Answer

Counting rows is slow, because all rows of the table have to be visited.
Counting id is even slower, because PostgreSQL first has to check if id is NULL or not (NULL values are not counted).

There are a few options to speed things up:

  • Use a more recent version of PostgreSQL.

    Then you can get parallel query, which will make the execution even more expensive, but faster.

  • Use the index on id and keep the table well vacuumed.

    Then you can get an index only scan.

  • Use an extra table with a counter that gets updated on every data modifying statement on the large table using a trigger.

Have a look at my blog post for an in-depth discussion.