MySQL – single heavily indexed table queries runs fast until the first join with another table

indexMySQLperformance

We have a single table that has around 20 – 30 columns describing our customers details (id, name, loginName, dateOfBirth, lastLoginDate, country …) table size is around 15 million rows.

We have many ways on our Backoffice to query \ filter \ sort this table and it works perfectly fast since this table is heavily indexed. (more than 20 indexes).

a new feature request was introduced last week that actually broke the performance in a devastating way:

the requirement was to add teams to our Backoffice. and assign this teams to our customers. (basically team #1 can be mapped to customers 1-100,000 and team #2 to customers 100,001 – 200,000 etc..)

To accommodate the teams feature we had to change the way we query the customers table since now we need to add the logic :
".. And the customer is mapped to MY_TEAM"

This is very easy to do if a customer can be mapped into a single team – we simply extend the above table with a new column called teamId and that it.

Problem is that customers might have more than one team (many to many relation) and this is where normalizing these mapping is harder in a single table.

We tried to add to the query this:

...  AND customerTable.customerId IN
        ((SELECT t2.customerId
        FROM TeamCustomerMapping t2  WHERE t2.teamId = 7)  

This works in its default form but when the filter \ sorts are applied to the customer table this is where it takes forever to run (I believe its because MySQL must use one index and without sorts and filters it uses the primary customerId index which helps it with the join but when you want to sort it should use a different index but then the join is impacted).

Any ideas how to handle this?

UPDATE – the actual query

SELECT
    tbl.customerId,
    tbl.country,
    tbl.language,
    tbl.lastLoginDate
FROM    
   CustomerTable tbl  
   INNER JOIN TeamCustomerMapping tbl2 on tbl.customerId= tbl2.customerId
WHERE
    1 = 1                       
        and tbl2.teamId = 9
        order by tbl.lastLoginDate asc

when teamId = 9 the query is slow, when teamId = 7 its fast.
the difference is how big teamId = 7 vs 9 is (one includes millions of customers the other only 10 customers)

adding the force index (lastLoginDate index) works fast only when teamId = 7 and slow when teamId = 9.

The TeamCustomerMapping Table has the following Primary index: (teamId,customerId)

UPDATE #2: Show create tables:

CREATE TABLE `TeamCustomerMapping ` (
  `customerId` bigint(20) NOT NULL,
  `teamId` bigint(16) NOT NULL,
  PRIMARY KEY (`teamId`,`customerId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `CustomerTable` (
  `customerId` bigint(20) NOT NULL,
  `timeCreated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `timeUpdated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `country` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  `language` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
  `lastLoginDate` datetime DEFAULT NULL
  PRIMARY KEY (`customerId`),
  KEY `countryIndex` (`country`),
  KEY `lastLoginIndex` (`lastLoginDate`),
  KEY `languageIndex` (`language`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

UPDATE #3 adding explain of the queries:

id , select_type,      table,     partitions,   type,   possible_keys,   key,   key_len,    ref,   rows,   filtered,  extra


// SLOW QUERY  (when team includes 7811365 players,  execution time 80 seconds. returns 7811365 rows)

1,   simple,   tbl2,   null,   ref,    PRIMARY,  PRIMARY,   8,  const,   3448866,  100.00, Using index; Using temporary; Using filesort

1,   simple,   tbl,   null,   eq_ref,    PRIMARY,  PRIMARY,   8,  db.tbl2.customerId,   1,  100.00, "  " 


// FAST QUERY  (when team includes 10 players,  execution time 0.1 seconds. returns 10 rows)

1,   simple,   tbl2,   null,   ref,    PRIMARY,  PRIMARY,   8,  const,   10,  100.00, Using index; Using temporary; Using filesort

1,   simple,   tbl,   null,   eq_ref,    PRIMARY,  PRIMARY,   8,  db.tbl2.customerId,   1,  100.00, "  " 


// SLOW QUERY  with force index (when team includes 7811365 players,  . returns 7811365 rows)

1,   simple,   tbl,   null,   index,    null,  lastLoginIndex,   6,  null,   7587368,  100.00, null

1,   simple,   tbl2,   null,   eq_ref,    PRIMARY,  PRIMARY,   16,  db.tbl2.customerId,   1,  100.00, Using Index


UPDATE 4 – queries with paginations:

id , select_type,      table,     partitions,   type,   possible_keys,   key,   key_len,    ref,   rows,   filtered,  extra


// SLOW QUERY without force index (using limit 10K like we do in production, when team includes 7811365 players  execution time 80 seconds. returns 10K rows)

1,   simple,   tbl2,   null,   ref,    PRIMARY,  PRIMARY,   8,  const,   3448866,  100.00, Using index; Using temporary; Using filesort

1,   simple,   tbl,   null,   eq_ref,    PRIMARY,  PRIMARY,   8,  db.tbl2.customerId,   1,  100.00, "  " 


// FAST QUERY without force index (using limit 10K like we do in production, when team includes 10 players  execution time 0.1 seconds. returns 10 rows)

1,   simple,   tbl2,   null,   ref,    PRIMARY,  PRIMARY,   8,  const,   10,  100.00, Using index; Using temporary; Using filesort

1,   simple,   tbl,   null,   eq_ref,    PRIMARY,  PRIMARY,   8,  db.tbl2.customerId,   1,  100.00, "  " 


// SLOW QUERY  with force index (using limit 10K like we do in production, when team includes 7811365 players  execution time 0.1 seconds. returns 10K rows)

1,   simple,   tbl,   null,   index,    null,  lastLoginIndex,   6,  null,   7587368,  100.00, null

1,   simple,   tbl2,   null,   eq_ref,    PRIMARY,  PRIMARY,   16,  db.tbl2.customerId,   1,  100.00, Using Index


UPDATE 5 – MySQL 8.0

Iv'e exported the data to a 8.0 instance and both the fast-query and slow-query are working fast.

But this is not enough since combining two of them together (which is actually a business requirement) is slow, probably because mysql should choose a single index and each team size dictates a different optimal index but both does not have an optimal index but multiple (2).

The query that doesnt work on 8.0:

SELECT
    tbl.customerId,
    tbl.country,
    tbl.language,
    tbl.lastLoginDate
FROM    
   CustomerTable tbl  
   INNER JOIN TeamCustomerMapping tbl2 on tbl.customerId= tbl2.customerId
WHERE
    1 = 1                       
        and tbl2.teamId in (7,9)
        order by tbl.lastLoginDate asc

UPDATE 6 – SQL-Server experiment

Iv'e exported the tables into a sql server instance.
all queries executes fast.
explaining the query plan for the and tbl2.teamId in (7,9) ... shows that 3 different indexes are used – one for the mapping table, and two for the main customer table.

Is this the obvious use case where sql server is preferable?

Best Answer

The problem is that for the large team it may be more efficient to start read index on CustomerTable and then check records in TeamCustomerMapping. As you see it works when you use limit for the amount of rows. It may be a bit tricky for MySQL server to estimate the amount of rows which it needs to read before it will find 10000 of necessary ones. But you can try to help it with hint. For example, you can run "select count(1) from TeamCustomerMapping where teamId = 9" first and if it's greater than 100000 or 300000 then run query with corresponding hint. Things may become more complicated when you add more filters. What MySQL server version do you use?