Mysql – Why isn’t this index helping the InnoDB MySQL query

indexMySQLperformancetuning

I'm having trouble understanding why an index I have on one of my tables is doing nothing to help the performance of a JOIN, even when forcing the index. Here is the query:

EXPLAIN SELECT
    *
FROM
    tusers PARTITION (p362) tu FORCE INDEX (index_tu_group_id_email)
    JOIN users PARTITION (p362) u FORCE INDEX (cc_group_email_index)
      ON u.group_id=tu.group_id 
      AND tu.email_address=u.email
      AND u.group_id = 362 
WHERE
    tu.appliation_id=253555;

+----+-------------+-------+------+-------------------------+-------------------------+---------+-----------------------------+---------+-------------+
| id | select_type | table | type | possible_keys           | key                     | key_len | ref                         | rows    | Extra       |
+----+-------------+-------+------+-------------------------+-------------------------+---------+-----------------------------+---------+-------------+
|  1 | SIMPLE      | u     | ref  | cc_group_email_index    | cc_group_email_index    | 4       | const                       | 8282617 | Using where |
|  1 | SIMPLE      | tu    | ref  | index_tu_group_id_email | index_tu_group_id_email | 247     | const,db.u.email            |       8 | Using where |
+----+-------------+-------+------+-------------------------+-------------------------+---------+-----------------------------+---------+-------------+

I included the FORCE INDEX on tusers here just to simplify the output, including or excluding that makes no difference on the query plan.

When I run the same query on my local database (with significantly less data) the query uses the index, and it provides the performance benefit you'd expect from doing so. One other difference I noticed when running locally: the explain output starts with the tusers filter, and then uses the index on the users table — I suspect that is related to my issue, but I'm not seeing why/how.

Production database (where the query is failing) has ~300,000,000 rows in the tusers table, and the users table has ~175,000,000 rows. Local database only has a few hundred thousand rows in each. MySQL versions are the 5.6.19 on production, 5.6.10 locally. The partitions in question on production have ~17,000,000 rows (users) and 7,600,000 rows (tusers).

Running a query directly against the email column in the users table makes use of the index:

EXPLAIN SELECT
    *
FROM users FORCE INDEX (cc_group_email_index)
WHERE group_id=362 and email='emailaddress@gmail.com'

+----+-------------+-------+------+----------------------+----------------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys        | key                  | key_len | ref         | rows | Extra       |
+----+-------------+-------+------+----------------------+----------------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | users | ref  | cc_group_email_index | cc_group_email_index | 407     | const,const |    1 | Using where |
+----+-------------+-------+------+----------------------+----------------------+---------+-------------+------+-------------+

I don't think this is related, given I'm forcing the index, but I noticed it so including just in case: cc_group_email_index is composed of two keys, each of which has a cardinality listed of 9, even though the cardinality of both columns is significantly higher. I also have another nearly identical index on a varchar column "raw_id" which when replaced in the above example works perfectly. The only difference between the two columns / indexes is:
the raw_id column is varchar(255), and the index is defined to only use the first 60 characters:

KEY `cc_group_email_index` (`group_id`,`email`),
KEY `cc_group_raw_id_index` (`group_id`,`raw_id`(60))

The email address column on tusers is 255 characters (which doesn't match users), though I wouldn't think this matters (and in fact, it does not impact things locally).

Any help/suggestions are HUGELY appreciated.

UPDATE

As requested, here is the table schema:

users:

+---------------------------------+--------------+------+-----+------------+-------+
| Field                           | Type         | Null | Key | Default    | Extra |
+---------------------------------+--------------+------+-----+------------+-------+
| group_id                        | int(11)      | NO   | PRI | NULL       |       |
| internal_user_id                | int(11)      | NO   | PRI | NULL       |       |
| create_date                     | date         | NO   |     | 1000-01-01 |       |
| first_trans_date                | date         | NO   |     | 1000-01-01 |       |
| first_trans_year                | smallint(6)  | NO   |     | 0          |       |
| first_trans_month               | tinyint(4)   | NO   |     | 0          |       |
| last_trans_date                 | date         | NO   |     | 1000-01-01 |       |
| first_purchase_revenue_in_cents | bigint(20)   | NO   |     | 0          |       |
| first_purchase_profit_in_cents  | bigint(20)   | NO   |     | 0          |       |
| first_name                      | varchar(50)  | YES  |     | NULL       |       |
| last_name                       | varchar(50)  | YES  |     | NULL       |       |
| raw_id                          | varchar(255) | YES  |     | NULL       |       |
| address_1                       | varchar(200) | YES  |     | NULL       |       |
| city                            | varchar(75)  | YES  |     | NULL       |       |
| state                           | varchar(50)  | YES  |     | NULL       |       |
| zip                             | varchar(20)  | YES  |     | NULL       |       |
| phone                           | varchar(20)  | YES  |     | NULL       |       |
| email                           | varchar(100) | YES  |     | NULL       |       |
| is_opt_in                       | tinyint(1)   | YES  |     | NULL       |       |
| mail_merge_1                    | varchar(255) | YES  |     | NULL       |       |
+---------------------------------+--------------+------+-----+------------+-------+

tusers:

+--------------------------+--------------+------+-----+---------+----------------+
| Field                    | Type         | Null | Key | Default | Extra          |
+--------------------------+--------------+------+-----+---------+----------------+
| id                       | int(11)      | NO   | PRI | NULL    | auto_increment |
| group_id                 | int(11)      | NO   | PRI | NULL    |                |
| application_id           | int(11)      | YES  | MUL | NULL    |                |
| group_user_id            | int(11)      | YES  |     | NULL    |                |
| raw_id                   | varchar(255) | YES  |     | NULL    |                |
| email_address            | varchar(255) | YES  |     | NULL    |                |
| first_name               | varchar(255) | YES  |     | NULL    |                |
| last_name                | varchar(255) | YES  |     | NULL    |                |
| mail_merge_1             | varchar(255) | YES  |     | NULL    |                |
+--------------------------+--------------+------+-----+---------+----------------+

Cardinality:

tusers: index_tu_group_id_email – 2,500,000
users: 17,000,000

Best Answer

This is just a guess, as I do not have all info, but you probably would be better by doing:

EXPLAIN SELECT STRAIGHT_JOIN
    *
FROM
    tusers PARTITION (p362) tu
    JOIN users PARTITION (p362) u
      ON u.group_id=tu.group_id 
      AND tu.email_address=u.email
      AND tu.group_id = 362 
WHERE
    tu.application_id=253555;

Note the STRAIGHT_JOIN, that may not be needed -if it is needed, then I may have assumed wrongly- and the tu.group_id comparison (that, again, shouldn't be needed).

Then using the following keys:

(tu.application_id, tu.group_id, tu.email_address)
(u.group_id, u.email)

However, if the number of records to be returned is 2.5M, as your cardinality suggests, then do not expect this to be fast... this is a pure IO math.

There are many other things that clicks me as problems, but I cannot say for sure without access.

Those could be even more effective if you didn't do a SELECT *.

Another thing is that varchar(255) is usually a bad idea.