MySQL – Do Composite or Unique Indexes Improve Query Performance?

MySQLmysql-5.5performancequery-performance

I need help with identifying indexes to increase performance of the following query. I'm currently using v5.5 with InnoDB. I cannot modify the query. However, I can add indexes as needed. The following indexes are currently available:

t1.id = Primary key
t1.status
t2.owner
t2.user

t2.total

select t1.`id` 
from (tbl1 t1 inner join tbl2 t2 on t1.`id` = t2.`id`) 
where (t2.`owner` = `ABC123` or t2.`user` = `ABC123`) 
  and t1.status = 1 
order by t2.`total` 
limit 0, 100;

Best Answer

(t2.`owner` = 'ABC123' or t2.`user` = 'ABC123')

This search term will force a table-scan. Composite indexes won't help.

Suppose you had a composite index on (owner, user). It could do a quick lookup for owner = 'ABC123' but then the desired values for user could be anywhere in the table, not just in the subset matching the desired owner value. So it has to scan the whole table anyway.

Likewise if you had a composite index on (user, owner). It could do a quick lookup for user = 'ABC123' but the same problem exists, finding the rows for owner will force a table-scan.

The one remaining possibility is to use index merge union optimization, but in my experience this doesn't work as well as you'd think.

Related Solutions

Mysql – Help optimizing MySQL slow query

I would like to get rid of "Using temporary; Using filesort"

One of the problems I see is that you're using different GROUP BY and ORDER BY clauses. From the manual on how MySQL uses temporary tables:

If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

As soon as you create a temporary table, it will need to be sorted according to your ORDER BY clause, indicated by 'using filesort'.

This execution plan at leasts uses the indexes to appropriately limit the number of rows found.

I would also look through the docs on ORDER BY optimization.

Sql-server – Query Performance Issue

You say

Above plan is for subid = 11 or 7 in @t table variable

I think you may be under a misapprehension here. SQL Server does not look at the contents of the table variable and choose a plan based upon the values it contains.

The statement is compiled before the table variable contains any rows at all and you will get the same plan (that assumes a single row) regardless of whether it eventually contains 2 (and would match 95.5% of the rows) or 1 (and would match only 0.0008%).

The table variable may of course also contain multiple rows but SQL Server will not take account of that except if you use the OPTION (RECOMPILE) hint and even then there are no statistics on table variables so it cannot take any account of actual values.

Some alternate plans are below

plan 1

plan 2

These require finding all matching rows and sorting them.

Because NCx_1 is not declared as a unique index the include(QueueItemID) is ignored (as explained in More About Nonclustered Index Keys) and QueueItemID gets added as an index key column instead. This means that SQL Server can seek on IsProcessed, QCode and the matching rows will be ordered by QueueItemID.

The plan in your question therefore avoids a sort operation but performance is entirely reliant upon how many rows in practice need to be evaluated before the first one matching the SubID IN (SELECT SubID FROM @t) predicate is found and the range seek can stop.

Of course this can vary wildly depending on how common the SubID values contained in @t are and whether there is any skew in the distribution of these values with respect to QueueItemID (You say that both around 350k rows match the seek predicate and that around 350k end up being seeked so for SubID = 7 it sounds like these are all at the end or perhaps no rows match at all - which would be the worse case for this plan).

It would be interesting to know what the estimated number of rows coming out of the seek is. Presumably this is much less than 350,000 and thus SQL Server chooses the plan you see based on this estimated cost.

If the table variable will always just have few rows you might find this rewrite works better for you.

SELECT TOP 1 QueueItemID
FROM   @t
       CROSS APPLY (SELECT TOP 1 t.QueueItemID
                    FROM   QueueTable t
                    WHERE  t.IsProcessed = 0
                           AND t.QCode = 'USA'
                           AND SubID = [@t].SubID
                    ORDER  BY t.QueueItemID) CA 
ORDER BY QueueItemID

For me it gives the plan below where it seeks into the index on subid,isprocessed,qcode,queueitemid as many times as you have rows in the table variable. It is similar to the first plan shown but may be slightly more efficient as each seek stops after the first row is returned.

plan

Best Answer

Related Solutions

Mysql – Help optimizing MySQL slow query

Sql-server – Query Performance Issue

Related Question