Mysql – Optimizing MYSQL statements

MySQLoptimization

Recently, I started working on a project, where I came across many MySQL statements like this:

SELECT s.Important, c.Interesting
FROM table1 s, table2 c 
WHERE s.IdForTableC = c.id 

or

SELECT q.Important, c.Interesting 
FROM table1 q, table2 p, table3 c 
WHERE q.IdInTableP = p.id 
AND p.IdInTableC = c.id 
AND c.Measure = 1

I am completely new to optimizing MySQL and read a few guides that advised me to use EXPLAIN SELECT. I did and get type all for the above query and type index for the second one. I guess this means, that both queries should be optimized, but I don't know how. Is it even possible without changing the whole structure of the database?

EDIT: the EXPLAIN output of the first one is:

'1', 'SIMPLE', 'c', 'ALL', 'PRIMARY', NULL, NULL, NULL, '467', ''
'1', 'SIMPLE', 's', 'ref', 'IdInTableC', 'IdInTableC', '5', 'table2.c.id', '2', 'Using where'

and INDEXES are:

'table2', '0', 'PRIMARY', '1', 'id', 'A', '14184441', NULL, NULL, '', 'BTREE', '', ''
'table2', '1', 'otherID', '1', 'otherID', 'A', '92708', NULL, NULL, '', 'BTREE', '', ''
'table2', '1', 'IdInTableC', '1', 'IdInTableC', 'A', '3048', NULL, NULL, '', 'BTREE', '', ''

Best Answer

Query 1

EXPLAIN is not simple to follow. First, let me explain how to optimize the two queries.

SELECT s.Important, c.Interesting
FROM table1 s, table2 c 
WHERE s.IdForTableC = c.id 

That's the old way to write a JOIN; please learn the new way:

SELECT s.Important, c.Interesting
    FROM table1 s
    JOIN table2 c  ON s.IdForTableC = c.id 

The ON explains how the two table relate to each other.

Now there is no filtering via a WHERE clause, so all that can be done is

  1. Go through all the rows of one of the tables. (No index is useful)
  2. Foreach row, reach into the other table.

If s is used for step 1, then an index on id is needed in c for step 2.
If c is used for step 1, then an index on IdForTableC is needed in s for step 2.

The convention is to name the PRIMARY KEY "id". A table must have a PRIMARY KEY. I'll guess that c has PRIMARY KEY(id). This makes starting with s a slam-dunk. But, instead, the Optimizer decided to start with c -- as seen by it being row 1 in the EXPLAIN.

"ALL" means that the entire table is scanned. Some of the NULLs go along with that. "PRIMARY" does not mean much in this context. c has about 467 rows.

To get into s, it used table2.c.id. Something is wrong there. That syntax says "database table2, table c, column id". Is your database called "table2". Ugh.

The index named IdInTableC was used. Presumably, you had

INDEX(IdInTableC)

so that name is the same as the column. 5 probably means it is a NULLable INT. Use NOT NULL unless you have a purpose for NULL. The JOIN will find an estimated 2 rows in s.

I guessed wrong as to which order c and s would be used. But the symmetry of the query implies that there might not be a lot of difference in which to start with.

Query 2

Again, let's use the new JOIN syntax:

SELECT q.Important, c.Interesting 
    FROM table1 q
    JOIN table2 p  ON q.IdInTableP = p.id
    JOIN table3 c  ON p.IdInTableC = c.id
    WHERE  c.Measure = 1

It is usually best to "filter" first. The only filter is c.Measure = 1, so c will probably be the first table, especially if you have

INDEX(Measure)

Then, the chain must be: c -> p -> q. So these are needed:

p:  INDEX(IdInTableC)
q:  INDEX(IdInTableP)

With those indexes, all three tables should use INDEX, perhaps with ref or eq_ref.

More Tips

In the Extras column:

  • "Using index" means that the index is "covering". That is, all the columns needed are found in the index. This is a minor performance boost.
  • "Using index condition" is different and mostly out of your control; it also implies a perf boost.
  • "Using join buffer" -- the second table is fully loaded into memory; this may be a big perf boost.
  • "Using temporary, Using filesort" -- People wrongly fear these. There is some performance loss, but usually one or both are required. After all, ORDER BY implies a sort. The absence of "filesort" implies an optimization that is sometimes available.

For some basic index-creation tips, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql

I don't use EXPLAIN that much; I try to go straight from the SELECT to the optimal index.