MySQL: Assign rankings for a large table of results

MySQLoptimizationperformancequery-performance

In our games we have a quartely league system. Every 3 months, the current season ends and all scorings of the current quarter are archived into a table called archive_league. Each entry represents a user and his score for a given league quarter. It comes with the following fields:

id - the id of an entry
uid - the id of a user
rounds - the number of games the user played in the given quarter
score - the score of the user for the given quarter
rank - the position of the player, descending by score, in the given quarter
date - the date of the quarter this entry represents

My goal is to go through all entries for a given date and assign the field "rank" for results. For example the player with the highest score should receive rank = 1, 2nd highest score rank = 2 and so on. If 2 players share the same score, they should receive the same rank (olympic scoring).

Example:

player x with score 528 receives rank 7
player y with score 528 receives rank 7
player z with score 529 receives rank 9

I am currently achieving this goal with this query:

UPDATE archive_league
        LEFT JOIN
    (SELECT 
        t.id,
            (SELECT 
                    COUNT(id) + 1
                FROM
                    archive_league x
                WHERE
                    x.score > t.score
                        AND x.date = :quarter) AS new_rank
    FROM
        archive_league t) AS temp USING (id) 
SET 
    rank = new_rank
WHERE
    date = :quarter;

The problem is, however, with some hundreds of thousands of entries, this query runs for a couple of days, even though I have created indexes for the WHERE conditions. How can I optimize this query to run faster?

Best Answer

Materialize it

It may be more performant to make a real (materialized) temp table instead of the inline subquery.

-- Cleanup any old temporary table structure
DROP TEMPORARY TABLE IF EXISTS temp_ranks;

-- Initialize a new temporary table. This will copy your same data types.
-- Structure only, no data.
CREATE TEMPORARY TABLE temp_ranks
  AS SELECT id, rank AS new_rank
     FROM archive_league WHERE 0=1;

-- Apply a PK index to the temp table for performance.
ALTER TABLE temp_ranks ADD PRIMARY KEY (id);

-- Compute ranks, store in temp table.
INSERT INTO temp_ranks
SELECT t.id,
      (SELECT COUNT(*) + 1
          FROM archive_league
          WHERE score > t.score
            AND date = :quarter) AS new_rank
    FROM archive_league t;

-- Apply to original table the materialized (temp) ranks.
UPDATE archive_league
LEFT JOIN temp_ranks USING (id) 
SET rank = new_rank
WHERE date = :quarter;

If we were using Oracle, I wouldn't expect this technique to make a difference (as Oracle's optimizer is pretty smart). However, MySQL's optimizer has some weak spots, one of which is that SELECT ... JOIN is pretty optimized, will choose best algorithm at run time (merge, hash, nested loop), yet UPDATE ... JOIN and DELETE ... JOIN lack the same optimization. Such an optimization is not impossible, but nobody has written the C code to make it so. If you're brilliant with programming databases in C, you are welcome to write an optimization and submit as a patch to MySQL (or MariaDB).

Reference from the manual https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html:

A limitation on UPDATE and DELETE statements that use a subquery to modify a single table is that the optimizer does not use semi-join or materialization subquery optimizations. As a workaround, try rewriting them as multiple-table UPDATE and DELETE statements that use a join rather than a subquery.

Related Question