Mysql – Improving join query performance with limit and index

indexMySQLperformancequery-performance

I have a query across two large tables. The first records the latest user activity at a location, the second is a dimension table with a natural primary key of location.

Table sizes here are roughly 100 millions rows in user_location_rating and 10 million rows in dim_location.
Most users have < 1000 records in user_location_rating and for those users query performance is adequate.

For users with a lot of activity data, this query, even though it is two simple selects, can still be slow.
I would like to improve the query performance. Can this by done by adding an additional index?
As an alternative, is there a way to exploit an index to make a more limited query (such as below) more efficient than the full query?

SELECT d.create_time
FROM user_location_rating f
JOIN dim_location d using(location_id)
WHERE f.user_id=?
  AND f.platform=?
  AND d.category=?;

SELECT d.create_time
FROM user_location_rating f
JOIN dim_location d using(location_id)
WHERE f.user_id=?
  AND f.platform=?
  AND d.category=?
ORDER BY d.create_time DESC
LIMIT 1000;

EXPLAIN SELECT yields the following on these queries (for a user with 999 events, say)

+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+
| id | select_type | table                | type   | possible_keys                 | key       | key_len | ref                              | rows | Extra       |
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+
|  1 | SIMPLE      | user_location_rating | ref    | k_userloc,k_locplat,k_usrplat | k_usrplat | 8       | const,const                      |  999 | Using index |
|  1 | SIMPLE      | dim_location         | eq_ref | PRIMARY                       | PRIMARY   | 4       | user_location_rating.location_id |    1 | Using where |
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+-------------+


+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+----------------------------------------------+
| id | select_type | table                | type   | possible_keys                 | key       | key_len | ref                              | rows | Extra                                        |
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | user_location_rating | ref    | k_userloc,k_locplat,k_usrplat | k_usrplat | 8       | const,const                      |  999 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | dim_location         | eq_ref | PRIMARY                       | PRIMARY   | 4       | user_location_rating.location_id |    1 | Using where                                  |
+----+-------------+----------------------+--------+-------------------------------+-----------+---------+----------------------------------+------+----------------------------------------------+

Table definitions

CREATE TABLE `user_location_rating` (
  `user_activity_id` int(16) NOT NULL AUTO_INCREMENT,
  `user_id` int(16) NOT NULL DEFAULT '0',
  `location_id` int(16) NOT NULL DEFAULT '0',
  `platform` int(2) NOT NULL DEFAULT '-1',
  `rating` int(2) NOT NULL DEFAULT '-1'
  PRIMARY KEY (`location_id`,`user_activity_id`),
  UNIQUE KEY `k_userloc` (`user_id`,`location_id`),
  KEY `k_locplat` (`location_id`,`platform`),
  KEY `k_usrplat` (`user_id`,`platform`)
) ENGINE=InnoDB ROW_FORMAT=DYNAMIC

CREATE TABLE `dim_location` (
  `location_id` int(16) NOT NULL AUTO_INCREMENT,
  `category` int(2) NOT NULL DEFAULT '0',
  `create_time` datetime DEFAULT NULL,
  PRIMARY KEY (`location_id`)
) ENGINE=InnoDB ROW_FORMAT=DYNAMIC

(I've made some effort to clean up the actual definitions here to be close to the minimal relevant example so apologies if any typos have crept in.)

Best Answer

For the first table, an index on (platform, user_id, location_id) or (user_id, platform, location_id) would be best, for both queries. Good for you, the existing index k_usrplat is equivalent to the second index (InnoDB indexes silently include the PK columns). You can see that the index is indeed used by both queries.

For the second table, it's more complex. You might get adequate performance with the existing index of the primary key, at least for the first query. A possible improvement is a (category, location_id, create_time) index.

You could experiment with a (category, create_time, location_id) index as well, which might be useful for the second query. A lot will depend on the distribution of data and efficiency may vary for different parameters.


The definitions of integer columns look weird. Why do you have int (16) and int (2)? If you think that that means any restrictions on the possible values in the columns, you are mistaken. The columns are of the same type, the numbers in parentheses is just a directive to user interfaces which is mostly ignored. If some of these columns, for example platform can only hold small values (say 0-100 or 0-2000), then use an appropriately smaller type:

tiny int   (-128 .. +127)      : 1 byte 
small int  (-32768 .. + 32767) : 2 bytes
medium int (-2^23 .. + 2^23-1) : 3 bytes
tiny int unsigned   (0 .. +255)    : 1 byte 
small int unsigned  (0 .. +65535)  : 2 bytes
medium int unsigned (0 .. +2^24-1) : 3 bytes

That will save space in the table and their indexes, both in disk and in memory usage.
Less space -> Less I/O -> faster queries