Mysql – Are these two statements the same performance wise

index-tuningMySQLoptimization

Setup

I have a combination index in table SubmissionStatus as show below. And running a simpl query 2 different ways.

 KEY `Combo_Status_Type_BidID` (`fk_ActivityStatus`,`ActivityType`,`fk_BidIDAssigned`),

Query 1

SELECT *
FROM SubmissionStatus SS
WHERE fk_ActivityStatus IN (5,12)
AND ActivityType = 61

Query 2

SELECT *
FROM SubmissionStatus SS
WHERE ActivityType = 61
AND fk_ActivityStatus IN (5,12)

My understanding

From my understanding the index I have in place here should benefit Query 1 because it has fk_ActivityStatus first, and ActivityType second, just as the Where/And conditions are listed in the statement.

Question

But, Is query 2 also getting to take full advantage of this index? Or does the fact that it has ActivityType listed first prevent it from being able to use the combo Index? (Making it having to do a full table scan, or look for an index that has ActivityType as the first.)

P.S.

The data table are currently to small to be able to get much reliable feedback from the EXPLAIN, and execution plan.

Best Answer

The order of ANDs in a WHERE clause does not matter.
The order of columns in an INDEX does matter.

See here for more discussion of indexes.

For small tests, use the Handler trick:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

When comparing two possible selects and/or indexes, the sizes of the handler numbers give a pretty reliable clue, even for small tables.

Related Solutions

Sql-server – Should these two indexes suggested by the SSMS missing index feature be combined

Well, you could consider a filtered index - if you're always looking for rows where IsSynchronized = 0 and this number should be relatively small, then instead of those two indexes, consider this instead:

CREATE NONCLUSTERED INDEX [IX_NotSynchronized] 
  ON [dbo].[PackageEvents] ([PackageID])  
  INCLUDE ([EventDate], [EventDescription], [EventID], 
    [LastSyncDate], [Notes], [UserName], [Version]) 
  WHERE IsSynchronized = 0;

Of course you may want to make that even smaller and test to see the difference in impact if the query has to look up the data (should be pretty efficient if the number of rows is small), so - assuming PackageID is the clustering key:

CREATE NONCLUSTERED INDEX [IX_NotSynchronized] 
  ON [dbo].[PackageEvents] ([PackageID])
  WHERE IsSynchronized = 0;

The overhead of maintaining this index may very well be worth the space savings compared to a full index, especially if it's only being used to optimize this query (or query pattern, at least).

Filtered indexes are not magic, though; JNK brought up some limitations below:

Caveats with filtered indexes - stats may not stay up to date without maintenance, and you need to use "standard" values for some settings like QUOTED IDENTIFIER and ANSI NULLS. These are small issues but if you have the settings wrong in a session that inserts into the index, the insert will fail.

Also you'll want to read these posts:

If you don't want to use a filtered index, you can probably test variations of these:

CREATE NONCLUSTERED INDEX [IX_Covering_try1] ON [dbo].[PackageEvents] 
  ([PackageID], IsSynchronized)  
INCLUDE ([EventDate], [EventDescription], [EventID], 
  [LastSyncDate], [Notes], [UserName], [Version]);

CREATE NONCLUSTERED INDEX [IX_Covering_try2] ON [dbo].[PackageEvents] 
  (IsSynchronized, [PackageID])  
INCLUDE ([EventDate], [EventDescription], [EventID], 
  [LastSyncDate], [Notes], [UserName], [Version]);

(For a long time I thought that including BIT columns in the key was wasteful but Martin Smith demonstrated a case where it worked quite well - worth a try. I can't find the post now.)

Without your full schema, data, query patterns etc. we can only guide you and have you test our suggestions in your environment. We can't say, "Ding! This is the one that will work for you!"

Sql-server – Why is the optimizer doing a table scan vs Index Seek

The optimizer is convinced that if it's going to have to go back to the disk for retrieving column data anyway, it might as well scan the table in the first place, since that'll be less work for it to do. It'll use the seek with the CHAR( 7 ) scalar because the statistics for the index know it's not going to find anything, but if data needs to be returned, it has to consider both CPU and I/O weights.

USE tempdb;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'tst'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.tst;
    CREATE TABLE [dbo].[tst] 
    (
        Mon                     [char](6) NULL,
        COL1                    [varchar](50) NULL,
        COL2                    [varchar](50) NULL,
        COL3                    [varchar](50) NULL,
        COL4                    [varchar](50) NULL,
        COL5                    [varchar](50) NULL
    );

    INSERT INTO dbo.tst ( [Mon] )
    SELECT  TOP 100000000
            CONVERT( CHAR( 6 ), DATEADD( DAY, ( ABS( CHECKSUM( NEWID() ) ) % 10000 + 1 ),
                '20000101' ), 112 )         
    FROM    sys.all_objects so
    CROSS APPLY sys.all_objects sp;

    CREATE NONCLUSTERED INDEX IX__tst__Mon
        ON dbo.tst ( Mon )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );
END;

SELECT  Mon, COUNT( 1 )
FROM    dbo.tst
GROUP BY Mon
ORDER BY Mon;

SET STATISTICS IO, TIME ON;

SELECT  Mon, COL1
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  Mon, COL1
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Specifying the hint, in both cases, does reduce the time required for the query to resolve, but the index seek + RID lookup actually results in a significant increase in the number of reads necessary ( my test indicated a 60% increase ). Obviously it's not a 1:1 trade off, since the time difference is about 6x, but regardless, the optimizer is choosing the scan instead.

If you can INCLUDE the columns you need in the index, you'll get the best of both worlds, eliminating the RID lookup and the additional reads.

--DROP INDEX dbo.tst.IX__tst__Mon
CREATE NONCLUSTERED INDEX IX__tst__Mon
    ON dbo.tst ( Mon )
INCLUDE ( COL1, COL2, COL3, COL4, COL5 )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );

SET STATISTICS IO, TIME ON;

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Best Answer

Related Solutions

Sql-server – Should these two indexes suggested by the SSMS missing index feature be combined

Sql-server – Why is the optimizer doing a table scan vs Index Seek

Related Question