Mysql – How to speed up queries on a large 220 million rows table (9 gig data)

MySQLperformancequery-performance

The issue:

We have a social site where members can rate each other for compatibility or matching. This user_match_ratings table contains over 220 million rows (9 gig data or almost 20 gig in indexes). Queries against this table routinely show up in slow.log (threshold > 2 seconds) and is the most frequently logged slow query in the system:

Query_time: 3  Lock_time: 0  Rows_sent: 3  Rows_examined: 1051
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 395357 group by rating;"

Query_time: 4  Lock_time: 0  Rows_sent: 3  Rows_examined: 1294
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 4182969 group by rating;"

Query_time: 3  Lock_time: 0  Rows_sent: 3  Rows_examined: 446
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 630148 group by rating;"

Query_time: 5  Lock_time: 0  Rows_sent: 3  Rows_examined: 3788
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 1835698 group by rating;"

Query_time: 17  Lock_time: 0  Rows_sent: 3  Rows_examined: 4311
"select rating, count(*) as tally from user_match_ratings where rated_user_id = 1269322 group by rating;"

MySQL version:

  • protocol version: 10
  • version: 5.0.77-log
  • version bdb: Sleepycat Software: Berkeley DB 4.1.24: (January 29, 2009)
  • version compile machine: x86_64 version_compile_os:redhat-linux-gnu

Table info:

SHOW COLUMNS FROM user_match_ratings;

Gives:

╔═══════════════╦════════════╦════╦═════╦════════╦════════════════╗
║ id            ║ int(11)    ║ NO ║ PRI ║ NULL   ║ auto_increment ║
║ rater_user_id ║ int(11)    ║ NO ║ MUL ║ NULL   ║                ║
║ rated_user_id ║ int(11)    ║ NO ║ MUL ║ NULL   ║                ║
║ rating        ║ varchar(1) ║ NO ║     ║ NULL   ║                ║
║ created_at    ║ datetime   ║ NO ║     ║ NULL   ║                ║
╚═══════════════╩════════════╩════╩═════╩════════╩════════════════╝

Sample query:

select * from mutual_match_ratings where id=221673540;

gives:

╔═══════════╦═══════════════╦═══════════════╦════════╦══════════════════════╗
║ id        ║ rater_user_id ║ rated_user_id ║ rating ║ created_at           ║
╠═══════════╬═══════════════╬═══════════════╬════════╬══════════════════════╣
║ 221673540 ║ 5699713       ║ 3890950       ║ N      ║ 2013-04-09 13:00:38  ║
╚═══════════╩═══════════════╩═══════════════╩════════╩══════════════════════╝

Indexes

The table has 3 indexes set up:

  1. single index on rated_user_id
  2. composite index on rater_user_id and created_at
  3. composite index on rated_user_id and rater_user_id
show index from user_match_ratings;

gives:

╔════════════════════╦════════════╦═══════════════════════════╦══════════════╦═══════════════╦═══════════╦═════════════╦══════════╦════════╦═════════════════════════╦════════════╦══════════════════╗
║ Table              ║ Non_unique ║ Key_name                  ║ Seq_in_index ║ Column_name   ║ Collation ║ Cardinality ║ Sub_part ║ Packed ║ Null                    ║ Index_type ║ Comment          ║
╠════════════════════╬════════════╬═══════════════════════════╬══════════════╬═══════════════╬═══════════╬═════════════╬══════════╬════════╬═════════════════════════╬════════════╬══════════════════╣
║ user_match_ratings ║ 0          ║ PRIMARY                   ║ 1            ║ id            ║ A         ║ 220781193   ║ NULL     ║ NULL   ║ BTREE                   ║            ║                  ║
║ user_match_ratings ║ 1          ║ user_match_ratings_index1 ║ 1            ║ rater_user_id ║ A         ║ 11039059    ║ NULL     ║ NULL   ║ BTREE                   ║            ║                  ║
║ user_match_ratings ║ 1          ║ user_match_ratings_index1 ║ 2            ║ created_at    ║ A         ║ 220781193   ║ NULL     ║ NULL   ║ BTREE                   ║            ║                  ║
║ user_match_ratings ║ 1          ║ user_match_ratings_index2 ║ 1            ║ rated_user_id ║ A         ║ 4014203     ║ NULL     ║ NULL   ║ BTREE                   ║            ║                  ║
║ user_match_ratings ║ 1          ║ user_match_ratings_index2 ║ 2            ║ rater_user_id ║ A         ║ 220781193   ║ NULL     ║ NULL   ║ BTREE                   ║            ║                  ║
║ user_match_ratings ║ 1          ║ user_match_ratings_index3 ║ 1            ║ rated_user_id ║ A         ║ 2480687     ║ NULL     ║ NULL   ║ BTREE                   ║            ║                  ║
╚════════════════════╩════════════╩═══════════════════════════╩══════════════╩═══════════════╩═══════════╩═════════════╩══════════╩════════╩═════════════════════════╩════════════╩══════════════════╝

Even with the indexes these queries are slow.

My Question:

Would separating this table/data unto another database on a server that has enough ram to store this data in memory would this speed up these queries? Is there anything in anyway that the tables/indexes are set up that we can improve upon to make these queries faster?

Currently we have 16GB of memory; however we are looking into either upgrading the existing machine to 32GB or adding a new machine with at least that much, maybe solid state drives as well.

Best Answer

Thoughts on the issue, thrown in random order:

  • The obvious index for this query is: (rated_user_id, rating). A query that gets data for only one of the million users and needs 17 seconds is doing something wrong: reading from the (rated_user_id, rater_user_id) index and then reading from the table the (hundreds to thousands) values for the rating column, as rating is not in any index. So, the query has to read many rows of the table which are located in many different disk locations.

  • Before starting adding numerous indexes in the tables, try to analyze the performance of the whole database, the whole set of slow queries, examine again the choices of the datatypes, the engine you use and the configuration settings.

  • Consider moving to a newer version of MySQL, 5.1, 5.5 or even 5.6 (also: Percona and MariaDB versions.) Several benefits as bugs have been corrected, the optimizer improved and you can set the low threshold for slow queries to less than 1 second (like 10 milliseconds). This will give you far better info about slow queries.

  • The choice for the datatype of rating is weird. VARCHAR(1)? Why not CHAR(1)? Why not TINYINT? This will save you some space, both tin the table and in the indexes that (will) include that column. A varchar(1) column needs one more byte over char(1) and if they are utf8, the (var)char columns will need 3 (or 4) bytes, instead of 1 (tinyint).