MySQL: Assign rankings for a large table of results

MySQLoptimizationperformancequery-performance

In our games we have a quartely league system. Every 3 months, the current season ends and all scorings of the current quarter are archived into a table called archive_league. Each entry represents a user and his score for a given league quarter. It comes with the following fields:

id - the id of an entry
uid - the id of a user
rounds - the number of games the user played in the given quarter
score - the score of the user for the given quarter
rank - the position of the player, descending by score, in the given quarter
date - the date of the quarter this entry represents

My goal is to go through all entries for a given date and assign the field "rank" for results. For example the player with the highest score should receive rank = 1, 2nd highest score rank = 2 and so on. If 2 players share the same score, they should receive the same rank (olympic scoring).

Example:

player x with score 528 receives rank 7
player y with score 528 receives rank 7
player z with score 529 receives rank 9

I am currently achieving this goal with this query:

UPDATE archive_league
        LEFT JOIN
    (SELECT 
        t.id,
            (SELECT 
                    COUNT(id) + 1
                FROM
                    archive_league x
                WHERE
                    x.score > t.score
                        AND x.date = :quarter) AS new_rank
    FROM
        archive_league t) AS temp USING (id) 
SET 
    rank = new_rank
WHERE
    date = :quarter;

The problem is, however, with some hundreds of thousands of entries, this query runs for a couple of days, even though I have created indexes for the WHERE conditions. How can I optimize this query to run faster?

Best Answer

Materialize it

It may be more performant to make a real (materialized) temp table instead of the inline subquery.

-- Cleanup any old temporary table structure
DROP TEMPORARY TABLE IF EXISTS temp_ranks;

-- Initialize a new temporary table. This will copy your same data types.
-- Structure only, no data.
CREATE TEMPORARY TABLE temp_ranks
  AS SELECT id, rank AS new_rank
     FROM archive_league WHERE 0=1;

-- Apply a PK index to the temp table for performance.
ALTER TABLE temp_ranks ADD PRIMARY KEY (id);

-- Compute ranks, store in temp table.
INSERT INTO temp_ranks
SELECT t.id,
      (SELECT COUNT(*) + 1
          FROM archive_league
          WHERE score > t.score
            AND date = :quarter) AS new_rank
    FROM archive_league t;

-- Apply to original table the materialized (temp) ranks.
UPDATE archive_league
LEFT JOIN temp_ranks USING (id) 
SET rank = new_rank
WHERE date = :quarter;

If we were using Oracle, I wouldn't expect this technique to make a difference (as Oracle's optimizer is pretty smart). However, MySQL's optimizer has some weak spots, one of which is that SELECT ... JOIN is pretty optimized, will choose best algorithm at run time (merge, hash, nested loop), yet UPDATE ... JOIN and DELETE ... JOIN lack the same optimization. Such an optimization is not impossible, but nobody has written the C code to make it so. If you're brilliant with programming databases in C, you are welcome to write an optimization and submit as a patch to MySQL (or MariaDB).

Reference from the manual https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html:

A limitation on UPDATE and DELETE statements that use a subquery to modify a single table is that the optimizer does not use semi-join or materialization subquery optimizations. As a workaround, try rewriting them as multiple-table UPDATE and DELETE statements that use a join rather than a subquery.

Related Solutions

Sql-server – Query to normalize table/combine row text

This should work, I will clean it up later so its more efficient.

DECLARE @Old TABLE ( 
  id         INT, 
  rank       INT, 
  linenumber INT, 
  sometext   VARCHAR(1000)) 
DECLARE @New TABLE ( 
  id           INT, 
  rank         INT, 
  combinedtext VARCHAR(1000)) 


;WITH combinedresults(ctid, id, rank, linenumber, combinedtext) 
     AS (SELECT 0, 
                id, 
                rank, 
                linenumber, 
                CAST (sometext AS VARCHAR(8000)) 
         FROM   @Old o 
         WHERE  NOT EXISTS (SELECT TOP 1 1 
                            FROM   @Old 
                            WHERE  id = o.id 
                                   AND rank = o.rank 
                                   AND linenumber < o.linenumber) 
         UNION ALL 
         SELECT ctid + 1, 
                o.id, 
                o.rank, 
                o.linenumber, 
                ct.combinedtext + o.sometext 
         FROM   @Old o 
                INNER JOIN combinedresults ct 
                  ON ct.id = o.id 
                     AND ct.rank = o.rank 
         WHERE  o.linenumber > ct.linenumber) 

UPDATE n 
SET    combinedtext = ct.combinedtext 
FROM   @New n 
       INNER JOIN (SELECT n.id, 
                          n.rank, 
                          MAX(o.rank) orank 
                   FROM   @new n 
                          INNER JOIN @Old o 
                            ON n.id = o.id 
                               AND o.rank <= n.rank 
                   GROUP  BY n.id, 
                             n.rank) r 
         ON n.id = r.id 
            AND n.rank = r.rank 
       INNER JOIN (SELECT id, 
                          ct.rank, 
                          MAX(ctid) ctid 
                   FROM   combinedresults ct 
                   GROUP  BY ct.id, 
                             ct.rank) r2 
         ON r2.id = r.id 
            AND r2.rank = r.orank 
       INNER JOIN combinedresults ct 
         ON r.id = ct.id 
            AND ct.rank = r.orank 
            AND ct.ctid = r2.ctid 

SELECT * 
FROM   @New

Mysql – Storing huge amount of user game data

First of all, you need to stop thinking in matrices.

You have 2 options. 1) Store each game & who won. 2) Store aggregates of who won.

Option 1

Store every game and the result of said game. Also allows more data about each game to be recorded within the games row.

create table players
(
  player_id integer,
  player_username varchar(50),
  other_player_related_column datatype,
  ...
  ...
  ...
);

create table games
(
  player1_id  integer, -- references players table
  player2_id  integer, -- references players table
  winner_id integer, -- the player_id of the winner 
  interesting_game_fact datatype -- other game info 
);

Option 2

Just store aggregates of the results of games played between two players. Losses can be derived from the number of games the opponent won. There should only be 1 row for each player_id pair - I suggest storing the lower of the two player_ids as player1_id, just to make life easier.

create table players
(
  player_id integer, 
  player_username varchar(50),
  other_player_related_column datatype,
  ...
  ...
  ...
);

create table results
(
  player1_id  integer, -- references players table
  player2_id  integer, -- references players table
  player1_wins integer, -- number of times player 1 has won
  player2_wins integer -- number of times player 2 has won
);