Mysql – status table join slows down query by 10 seconds

indexMySQLmysql-5.6optimization

Got an issue with a fair few queries in our system that all rely on a status table (contains 3 rows)

CREATE TABLE `assessment_status` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `Title` varchar(30) DEFAULT NULL,
  `Completed` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `Started` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `Sequence` smallint(6) DEFAULT '1',
  `Archived` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`),
  KEY `Archived` (`Archived`),
  KEY `Completed` (`Completed`),
  KEY `idAndcompleted` (`id`,`Completed`),
  KEY `idCompletedArchived` (`id`,`Completed`,`Archived`),
  KEY `idCompletedTitle` (`id`,`Title`,`Completed`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;

Data

id      Title           Completed   Started     Sequence    Archived
'1'     'Not Started'   '0'         '0'         '1'         '0'
'2'     'Started'       '0'         '1'         '2'         '0'
'3'     'Completed'     '1'         '1'         '3'         '0'

When running the query which has lots of other table joins but to this is:

select 
...
ass_s.Title AS `Assessment Section`
... 
from au
INNER JOIN assessment_status as_s ON au.idStatus = as_s.id
....
where as_s.Completed = 1

explain extended is:

id  select_type table   type    possible_keys                                                               key         key_len     ref         rows    filtered    Extra
'1' 'PRIMARY'   'as_s'  'ref'   PRIMARY,id,Completed,idAndcompleted,idCompletedArchived,idCompletedTitle'   'Completed' '1'         'const'     '1'     '100.00'    'Using temporary; Using filesort'

and the query takes about 12+seconds to return

Yet if I switch it round to a subquery for title and where exists it runs in 0.6seconds

select 
(select as_s.title from assessment_status as_s where au.idStatus = as_s.id and as_s.Completed = 1) AS `Status`
... 
from au
....
where exists (select as_s.Completed from assessment_status as_s where au.idStatus = as_s.id and as_s.Completed = 1)

explain extended is:

id  select_type         table   type    possible_keys                                                                   key         key_len     ref                 rows    filtered    Extra
5   DEPENDENT SUBQUERY  as_s    eq_ref  PRIMARY,id,Completed,idAndcompleted,idCompletedArchived,idCompletedTitle        PRIMARY     4           fslcms.au.idStatus  1       100         Using where
3   DEPENDENT SUBQUERY  as_s    eq_ref  PRIMARY,id,Completed,idAndcompleted,idCompletedArchived,idCompletedTitle        PRIMARY     4           fslcms.au.idStatus  1       100         Using where

It's obviously still selecting on the same values, yet runs a hell of a lot faster as it's using where instead on const. The thing is, reading MySQL docs on ref const states it's because it's only using one value it treats it as a const, which is the fastest way!?

I'm quite new with MySQL, come from MSSQL and never come across such a small table that could cause such a big performance issue before… As you can see from the table create, I've even tried varied versions of the index to see if it helps.

Regards

Liam

Best Answer

A PRIMARY KEY is a UNIQUE KEY and is an INDEX. Don't re-declare id as UNIQUE.

Don't start an index with the PRIMARY KEY; it is likely never to be used.

Don't index flags; they are unlikely to be used.

So, I have just argued against all 6 secondary indexes, leaving only PRIMARY KEY(id)

There are only 3 rows in assessment_status, correct? It is actually faster to do a table scan than to use an index.

Your first query seems to need to find the one row with completed = 1, then find the rows in au with au.idStatus=3, correct?

The EXPLAIN that you provide does not match the SELECT. There are 2 tables in the select, but only one row in the explain??

Ditto for the second EXPLAIN. But, there I see id=5 and id=3, implying that there are even more tables! It could be that the order of tackling the tables was drastically changed; this could be a cause or clue of the timing differences.

How big is au? The missing line in the EXPLAINs probably says that it will be a table scan, and gives an estimate of the number of rows.

Was the 12-second run tried twice? I ask because the first run may have spent a lot of time fetching data from disk and the second would be running entirely from cached data in RAM. This could also explain why the other query took only 0.6 sec.

If you are going to have the Question focus on a portion of the real query, then provide a shrunken test case that demonstrates it. Else it may not be a valid shrinking, and we cannot necessarily help your real query.

Still, I will guess at what is going on. The first query (I guess) had only one thing in the WHERE clause, a reference to as_s. This tricked the Optimizer into deciding to start with that table. But because of stuff not in evidence, some other table would be better to start with. The second query seems to have taken that path; it could not start with as_s anyway. A guess: there is no WHERE in the second query?