SQL Server Performance – Why Simple SELECTs Are Faster Than LEFT JOINs

execution-planperformancequery-performancesql serversql-server-2012

I need to find a single value from a table A containing three foreign keys to two other tables B and C.

For the sake of experiment, I tested two ways to query the value:

Multiple queries:

declare @start int = (select top 1 [Id] from [B] where [Day] = '2015-01-01')
declare @end int = (select top 1 [Id] from [B] where [Day] = '2017-06-14')
declare @category int = (select top 1 [Id] from [C] where [Title] = 'Hello, World!')

select top 1 [Name]
from [A]
where [StartId] = @start
    and [EndId] = @end
    and [CategoryId] = @category
    and [Day] = '2016-05-27'

Single query:

select top 1 [Name]
from [A]
left join [B] as [BStart] on [BStart].[Id] = [A].[StartId]
left join [B] as [BEnd] on [BEnd].[Id] = [A].[EndId]
left join [C] on [C].[Id] = [A].[CategoryId]
where [BStart].[Day] = '2015-01-01'
    and [BEnd].[Day] = '2017-06-14'
    and [C].[Title] = 'Hello, World!'
    and [A].[Day] = '2016-05-27'

I was surprised that the execution plan indicates that the single query is more expensive than multiple queries. When doing all five selects together, the one with left joins indicates 53%. The other four queries indicate 12% each.

Those are the execution plans:

declare @start int = (select top 1 [Id] from [B] where [Day] = '2015-01-01')`

declare @end int = (select top 1 [Id] from [B] where [Day] = '2017-06-14')

(Same as below)

declare @category int = (select top 1 [Id] from [C] where [Title] = 'Hello, World!')

select top 1 [Name]
from [A]
where [StartId] = @start
    and [EndId] = @end
    and [CategoryId] = @category
    and [Day] = '2016-05-27'

select top 1 [Name]
from [A]
left join [B] as [BStart] on [BStart].[Id] = [A].[StartId]
left join [B] as [BEnd] on [BEnd].[Id] = [A].[EndId]
left join [C] on [C].[Id] = [A].[CategoryId]
where [BStart].[Day] = '2015-01-01'
    and [BEnd].[Day] = '2017-06-14'
    and [C].[Title] = 'Hello, World!'
    and [A].[Day] = '2016-05-27'

Why is the single query with left joins slower than the first approach?

Best Answer

Your execution plan for the "individual queries" shows that pre-calculating the StartID, Category etc allows an index to be used efficiently on A, "seeking" straight to the record(s) you want (you have a Non-Clustered Index Seek in your query plan), having identified the given Category etc to search for.

The "single-query" with JOINs, on the other hand, needs to carry out the join between table A-B-C "on the fly" before returning the "top 1" result matching your criteria. This join will involve all the records of those tables where they match to the other related table (full clustered index scan).

If you are frequently querying by Day and Title like this, it looks like an index is missing (or statistics out of date) that allow table B and C to be searched by those criteria.

By the way, Top 1 will only give a deterministic result if you "order by" some criteria (or if your search statement is unique) - otherwise you will get the 1st row the query comes to, which may or may not be consistent between occasions when you run it.

YOUR QUERY

SELECT post.postid, post.attach FROM newbb_innopost AS post WHERE post.threadid = 51506;

At first glance, that query should only touches 1.1597% (62510 out of 5390146) of the table. It should be fast given the key distribution of threadid 51506.

REALITY CHECK

No matter which version of MySQL (Oracle, Percona, MariaDB) you use, none of them can fight to one enemy they all have in common : The InnoDB Architecture.

InnoDB Architecture

CLUSTERED INDEX

Please keep in mind that the each threadid entry has a primary key attached. This means that when you read from the index, it must do a primary key lookup within the ClusteredIndex (internally named gen_clust_index). In the ClusteredIndex, each InnoDB page contains both data and PRIMARY KEY index info. See my post Best of MyISAM and InnoDB for more info.

REDUNDANT INDEXES

You have a lot of clutter in the table because some indexes have the same leading columns. MySQL and InnoDB has to navigate through the index clutter to get to needed BTREE nodes. You should reduced that clutter by running the following:

ALTER TABLE newbb_innopost
    DROP INDEX threadid,
    DROP INDEX threadid_2,
    DROP INDEX threadid_visible_dateline,
    ADD INDEX threadid_visible_dateline_index (`threadid`,`visible`,`dateline`,`userid`)
;

Why strip down these indexes ?

The first three indexes start with threadid
threadid_2 and threadid_visible_dateline start with the same three columns
threadid_visible_dateline does not need postid since it's the PRIMARY KEY and it's embedded

BUFFER CACHING

The InnoDB Buffer Pool caches data and index pages. MyISAM only caches index pages.

Just in this area alone, MyISAM does not waste time caching data. That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.

TABLE LAYOUT

You could shave of some space from the row by considering importthreadid and importpostid. You have them as BIGINTs. They take up 16 bytes in the ClusteredIndex per row.

You should run this

SELECT importthreadid,importpostid FROM newbb_innopost PROCEDURE ANALYSE();

This will recommend what data types these columns should be for the given dataset.

CONCLUSION

MyISAM has a lot less to contend with than InnoDB, especially in the area of caching.

While you revealed the amount of RAM (32GB) and the version of MySQL (Server version: 10.0.12-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4002), there are still other pieces to this puzzle you have not revealed

The InnoDB settings
The Number of Cores
Other settings from my.cnf

If you can add these things to the question, I can further elaborate.

UPDATE 2014-08-28 11:27 EDT

You should increase threading

innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M

I would consider disabling the query cache (See my recent post Why query_cache_type is disabled by default start from MySQL 5.6?)

query_cache_size = 0

I would preserve the Buffer Pool

innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1

Increase purge threads (if you do DML on multiple tables)

innodb_purge_threads = 4

Best Answer

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

MySQL Performance – Why Simple SELECTs on InnoDB Are 100x Slower Than on MyISAM