PostgreSQL Performance – Retrieving Data from Database with Millions of Rows

database-designindexindex-tuningpostgresqlpostgresql-performance

I've been working on a system where users can register as an user, create a book club and invite other people (members) to join. The user and member can all add books to the club, and also vote for books that other members have added. I recently tried to add a lot of data to check if the database is performing well, after which I found out that it takes a lot of time to actually get the data I like. I would like to get all books in a club, including their votes and the name of the members that have voted for it.

My database diagram (created via dbdiagram.io, check it out)

Database diagram
In order to freely query the database without too much hassle, I decided to use Hasura, an open source service that can create a GraphQL backend by just looking at the data structure (I am using PostgresQL). I Use the following query to get the data I want:

query GetBooksOfClubIncludingVotesAndMemberName {
  books(
    where: {
      club_id: {_eq: "3"}, 
      state:{_eq: 0 }
    }, 
    order_by: [
      { fallback : asc },
      { id: asc }
    ]
  ) {
    id
    isbn
    state
    votes {
      member {
        id
        name
      }
    }
  }    
}

This query of course gets converted to a SQL statement

SELECT
  coalesce(
    json_agg(
      "root"
      ORDER BY
        "root.pg.fallback" ASC NULLS LAST,
        "root.pg.id" ASC NULLS LAST
    ),
    '[]'
  ) AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_8_e"
          FROM
            (
              SELECT
                "_0_root.base"."id" AS "id",
                "_0_root.base"."isbn" AS "isbn",
                "_7_root.ar.root.votes"."votes" AS "votes"
            ) AS "_8_e"
        )
      ) AS "root",
      "_0_root.base"."id" AS "root.pg.id",
      "_0_root.base"."fallback" AS "root.pg.fallback"
    FROM
      (
        SELECT
          *
        FROM
          "public"."books"
        WHERE
          (
            (("public"."books"."club_id") = (('3') :: bigint))
            AND (("public"."books"."state") = (('0') :: smallint))
          )
      ) AS "_0_root.base"
      LEFT OUTER JOIN LATERAL (
        SELECT
          coalesce(json_agg("votes"), '[]') AS "votes"
        FROM
          (
            SELECT
              row_to_json(
                (
                  SELECT
                    "_5_e"
                  FROM
                    (
                      SELECT
                        "_4_root.ar.root.votes.or.member"."member" AS "member"
                    ) AS "_5_e"
                )
              ) AS "votes"
            FROM
              (
                SELECT
                  *
                FROM
                  "public"."votes"
                WHERE
                  (("_0_root.base"."id") = ("book_id"))
              ) AS "_1_root.ar.root.votes.base"
              LEFT OUTER JOIN LATERAL (
                SELECT
                  row_to_json(
                    (
                      SELECT
                        "_3_e"
                      FROM
                        (
                          SELECT
                            "_2_root.ar.root.votes.or.member.base"."id" AS "id",
                            "_2_root.ar.root.votes.or.member.base"."name" AS "name"
                        ) AS "_3_e"
                    )
                  ) AS "member"
                FROM
                  (
                    SELECT
                      *
                    FROM
                      "public"."members"
                    WHERE
                      (
                        ("_1_root.ar.root.votes.base"."member_id") = ("id")
                      )
                  ) AS "_2_root.ar.root.votes.or.member.base"
              ) AS "_4_root.ar.root.votes.or.member" ON ('true')
          ) AS "_6_root.ar.root.votes"
      ) AS "_7_root.ar.root.votes" ON ('true')
    ORDER BY
      "root.pg.fallback" ASC NULLS LAST,
      "root.pg.id" ASC NULLS LAST
  ) AS "_9_root";

When executing this statement using EXPLAIN ANALYZE in front of it, it tells me that it took approximately 9217 milliseconds to finish, check the analyze response below

                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12057321.11..12057321.15 rows=1 width=32) (actual time=9151.967..9151.967 rows=1 loops=1)
   ->  Sort  (cost=12057312.92..12057313.38 rows=182 width=37) (actual time=9151.856..9151.865 rows=180 loops=1)
         Sort Key: books.fallback, books.id
         Sort Method: quicksort  Memory: 72kB
         ->  Nested Loop Left Join  (cost=66041.02..12057306.09 rows=182 width=37) (actual time=301.721..9151.490 rows=180 loops=1)
               ->  Index Scan using book_club on books  (cost=0.43..37888.11 rows=182 width=42) (actual time=249.506..304.469 rows=180 loops=1)
                     Index Cond: (club_id = '3'::bigint)
                     Filter: (state = '0'::smallint)
               ->  Aggregate  (cost=66040.60..66040.64 rows=1 width=32) (actual time=49.134..49.134 rows=1 loops=180)
                     ->  Nested Loop Left Join  (cost=0.72..66040.46 rows=3 width=32) (actual time=0.037..49.124 rows=3 loops=180)
                           ->  Index Only Scan using member_book on votes  (cost=0.43..66021.32 rows=3 width=8) (actual time=0.024..49.104 rows=3 loops=180)
                                 Index Cond: (book_id = books.id)
                                 Heap Fetches: 540
                           ->  Index Scan using members_pkey on members  (cost=0.29..6.38 rows=1 width=36) (actual time=0.005..0.005 rows=1 loops=540)
                                 Index Cond: (id = votes.member_id)
                                 SubPlan 2
                                   ->  Result  (cost=0.00..0.04 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=540)
                     SubPlan 3
                       ->  Result  (cost=0.00..0.04 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=540)
               SubPlan 1
                 ->  Result  (cost=0.00..0.04 rows=1 width=32) (actual time=0.001..0.002 rows=1 loops=180)
 Planning Time: 0.788 ms
 JIT:
   Functions: 32
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 4.614 ms, Inlining 52.818 ms, Optimization 113.442 ms, Emission 81.939 ms, Total 252.813 ms
 Execution Time: 9217.899 ms
(27 rows)

With table sizes of:

   relname    | rowcount
--------------+----------
 books        |  1153800
 members      |    19230
 votes        |  3461400
 clubs        |     6410
 users        |        3

This takes way too long. In my previous design, I did not have any indexes, which caused it to be even slower. I have added indexes, but I'm still not quite happy about the fact that I'll have to wait that long. Is there anything that I can improve concerning the data structure or anything?

EDIT
Same select statement, but now using EXPLAIN (ANALYZE, BUFFERS) as suggested:

                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12057321.11..12057321.15 rows=1 width=32) (actual time=8896.202..8896.202 rows=1 loops=1)
   Buffers: shared hit=2392279 read=9470
   ->  Sort  (cost=12057312.92..12057313.38 rows=182 width=37) (actual time=8896.097..8896.106 rows=180 loops=1)
         Sort Key: books.fallback, books.id
         Sort Method: quicksort  Memory: 72kB
         Buffers: shared hit=2392279 read=9470
         ->  Nested Loop Left Join  (cost=66041.02..12057306.09 rows=182 width=37) (actual time=222.978..8895.801 rows=180 loops=1)
               Buffers: shared hit=2392279 read=9470
               ->  Index Scan using book_club on books  (cost=0.43..37888.11 rows=182 width=42) (actual time=174.471..214.000 rows=180 loops=1)
                     Index Cond: (club_id = '3'::bigint)
                     Filter: (state = '0'::smallint)
                     Buffers: shared hit=113 read=9470
               ->  Aggregate  (cost=66040.60..66040.64 rows=1 width=32) (actual time=48.211..48.211 rows=1 loops=180)
                     Buffers: shared hit=2392166
                     ->  Nested Loop Left Join  (cost=0.72..66040.46 rows=3 width=32) (actual time=0.028..48.202 rows=3 loops=180)
                           Buffers: shared hit=2392166
                           ->  Index Only Scan using member_book on votes  (cost=0.43..66021.32 rows=3 width=8) (actual time=0.018..48.187 rows=3 loops=180)
                                 Index Cond: (book_id = books.id)
                                 Heap Fetches: 540
                                 Buffers: shared hit=2390546
                           ->  Index Scan using members_pkey on members  (cost=0.29..6.38 rows=1 width=36) (actual time=0.004..0.004 rows=1 loops=540)
                                 Index Cond: (id = votes.member_id)
                                 Buffers: shared hit=1620
                                 SubPlan 2
                                   ->  Result  (cost=0.00..0.04 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=540)
                     SubPlan 3
                       ->  Result  (cost=0.00..0.04 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=540)
               SubPlan 1
                 ->  Result  (cost=0.00..0.04 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=180)
 Planning Time: 0.400 ms
 JIT:
   Functions: 32
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.060 ms, Inlining 9.923 ms, Optimization 94.927 ms, Emission 68.793 ms, Total 175.702 ms
 Execution Time: 8898.360 ms
(35 rows)

EDIT 2:
Using select * from pg_prepared_xacts; and select * from pg_stat_activity; as suggested in an answer. The first statement does not show any rows, and for the second one I did not notice any old xact_start times, and this was done after previously (yesterday) running VACUUM FULL votes. Running VACUUM FULL votes unfortunately does not fix the problem.

Output of statements:

booky=# select * from pg_prepared_xacts;
 transaction | gid | prepared | owner | database
-------------+-----+----------+-------+----------
(0 rows)

booky=# select * from pg_stat_activity;
 datid  | datname | pid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin |              query              |         backend_type
--------+---------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+---------------------------------+------------------------------
        |         |  31 |          |          |                  |             |                 |             | 2020-04-05 08:41:47.959657+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |                                 | autovacuum launcher
        |         |  33 |       10 | postgres |                  |             |                 |             | 2020-04-05 08:41:47.959964+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |                                 | logical replication launcher
 169575 | booky   |  48 |       10 | postgres | psql             |             |                 |          -1 | 2020-04-05 10:05:20.847798+00 | 2020-04-05 10:07:47.534381+00 | 2020-04-05 10:07:47.534381+00 | 2020-04-05 10:07:47.534382+00 |                 |                     | active |             |     15265333 | select * from pg_stat_activity; | client backend
        |         |  29 |          |          |                  |             |                 |             | 2020-04-05 08:41:47.959346+00 |                               |                               |                               | Activity        | BgWriterHibernate   |        |             |              |                                 | background writer
        |         |  28 |          |          |                  |             |                 |             | 2020-04-05 08:41:47.959688+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |                                 | checkpointer
        |         |  30 |          |          |                  |             |                 |             | 2020-04-05 08:41:47.959501+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |                                 | walwriter
(6 rows)

Best Answer

  ->  Index Only Scan using member_book on votes  (cost=0.43..66021.32 rows=3 width=8) (actual time=0.024..49.104 rows=3 loops=180)
         Index Cond: (book_id = books.id)
         Heap Fetches: 540

49.104 * 180 = 8839, which is substantially all of your time. Most likely most of this time is going to IO to read random pages from the table (if you turn on track_io_timings and then do EXPLAIN (ANALYZE, BUFFERS) we would have a definitive answer to that).

If you vacuum "votes", and so get rid of the heap fetches, it would almost surely solve the problem.

   ->  Index Only Scan using member_book on votes  (cost=0.43..66021.32 rows=3 width=8) (actual time=0.018..48.187 rows=3 loops=180)
           Index Cond: (book_id = books.id)
           Heap Fetches: 540
           Buffers: shared hit=2390546

If this was done after the VACUUM was done, then you probably have some kind of long-running transaction being held open, which is preventing VACUUM from doing its job effectively. Also, hitting 2,390,546 buffers to get 540 row seems incredibly outlandish. Again, that could be due to some long-open transaction causing massive bloat in your index and/or table.

Does select * from pg_prepared_xacts; show any rows? Does select * from pg_stat_activity show any old times for xact_start? If neither of those, then can you do a VACUUM FULL votes and see if that fixes the problem?