Postgresql – Designing a High Score/Leaderboard table

database-designperformancepostgresql

I'm looking for input on how to design a highscore table for a game I'm creating.

The database I'm using is postgres. That's not really something I can or want to change (unless there's very good reason for it).

                                                          version
-----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.4 on x86_64-apple-darwin13.1.0, compiled by Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit

My goal is to allow for up to 10 000 000 rows. I'm not sure if this is realistic.

In my initial attempt, the table is on the simplest of forms (note keys on alias and score).

 Column  |          Type          |                        Modifiers
---------+------------------------+---------------------------------------------------------
id       | integer                | not null default nextval('highscores_id_seq'::regclass)
traceid  | integer                |
alias    | character varying(256) |
score    | integer                |
rows     | integer                |
level    | integer                |
duration | integer                |
Indexes:
    "highscores_pkey" PRIMARY KEY, btree (id)
    "highscores_alias_idx" btree (alias)
    "highscores_score_idx" btree (score)

Most interesting columns for my problem are alias and score (I included them all on the off chance that they have some impact I'm unaware of).

Queries

Specifically I need to perform two queries. One which selects highscores, sorted by score with OFFSET and LIMIT (at most 1000 rows per query). I've written this:

SELECT *, RANK() over(ORDER BY score DESC) AS rank FROM highscores OFFSET 0 LIMIT 100

This works well for the topmost results (which are by far most common) and decreases in performance the larger I make OFFSET. Not ideal, but acceptable (I'm guessing caches are at play).

The second query is more complicated; Given an alias (e.g. a name of a user), it should give the highscores achieved by that user, and also each highscore's position in the global table (ordered by score).

I wrote this:

SELECT * FROM
  (SELECT *, RANK() OVER (ORDER BY score DESC) AS rank FROM highscores) AS tbl
WHERE alias = 'somealias'

It gives the correct result (e.g. rank is set to the global position in the table). It is horribly slow however. It takes up to 30 seconds, which is really not acceptable.

Explaining (ANALYZE, BUFFERS) these queries give the following (respectively)

link do depesz:

Limit  (cost=0.42..303.61 rows=100 width=35) (actual time=0.029..0.490 rows=100 loops=1)
  Buffers: shared hit=110
  ->  WindowAgg  (cost=0.42..606388.37 rows=200002 width=35) (actual time=0.028..0.469 rows=100 loops=1)
        Buffers: shared hit=110
        ->  Index Scan Backward using highscores_score_idx on highscores  (cost=0.42..603388.34 rows=200002 width=35) (actual time=0.015..0.264 rows=100 loops=1)
              Buffers: shared hit=110

and

link to depesz

Subquery Scan on tbl  (cost=67440.85..73440.91 rows=1 width=43) (actual time=960.290..960.290 rows=0 loops=1)
    Filter: ((tbl.alias)::text = 'somealias'::text)
    Rows Removed by Filter: 200002
    Buffers: shared hit=3102 read=33788, temp read=1829 written=1829
    ->  WindowAgg  (cost=67440.85..70940.89 rows=200002 width=35) (actual time=560.130..920.911 rows=200002 loops=1)
          Buffers: shared hit=3102 read=33788, temp read=1829 written=1829
          ->  Sort  (cost=67440.85..67940.86 rows=200002 width=35) (actual time=560.117..652.521 rows=200002 loops=1)
                Sort Key: highscores.score
                Sort Method: external merge  Disk: 8208kB
                Buffers: shared hit=3102 read=33788, temp read=1829 written=1829
                ->  Seq Scan on highscores  (cost=0.00..38890.02 rows=200002 width=35) (actual time=111.099..215.351 rows=200002 loops=1)
                    Buffers: shared hit=3102 read=33788width=35)

(from a table with 200 000 randomly generated entries).

Help

I'm not sure if my table design is optimal. I can change it if it would help with performance in any way. I'm guessing queries could be written differently as well to improve performance. I can't see how though. Any pointers would be greatly appreciated!

Best Answer

Assuming that you are prepared to compromise on the absolute correctness of the answer in order to obtain practical performance, you could do the following:

  • Create an index on score (assuming descending, but either way is workable)
  • Create a maintenance job to rebuild this index periodically (how often depends on your willingness to have your rank results be out of date)

I take it that you are OK with these assumptions, since you have noted an index on score in your question.

Then your user's global ranking for their scores would be something like this:

select 
  S.*
, count (R.id from highscores R
         where R.score > S.score) as rank
from highscores S
where S.alias = 'somealias'

Using correlated subqueries is not a recipe for blinding speed, but the advantage of this is that you don't have to rank everything, you just have to count entries in an index relative to a user's particular scores. Of course, only your query optimizer will know for sure.