Optimising a sql correlated subquery statement

indexoptimization

I am updating a column on tableA with data from tableB like this:

update tableA
  set bKey = select bKey from tableB B where B.colB = tableA.colB

bKey has a unique index on tableB.

Which is the better optimisation for this query, indexing tableB on ColB or (ColB,bKey) and why?

I am interested in answers for any database.

Best Answer

Neither.

Index ColB INCLUDE (bKey)

Since you aren't filtering on bKey you don't need it at all the nodes of the index, just the leaf level.

If you only index on ColB then you will still have to pay for a key lookup to get the value of bKey from the clustered index.

Also is there a reason this isn't using a JOIN instead of a correlated subquery?

UPDATE a
SET a.bKey = b.bKey
FROM TableA A
INNER JOIN TableB B
ON A.ColB = B.ColB

Executive Summary

For each recommendation your optimiser script has generated, check out the appropriate MySQL documentation to see if it will impact your setup. Then tweak your my.conf little by little.

Your overall objective is for MySQL to load as much as it can into memory and not hit the disk. So increasing various caches will probably help. As long as you don't exceed the physical memory in your server (you need to take into account other processes running on the server). Then, make sure your tables are indexed appropriately.

Your Specific Issues

But to address each of the items you've listed:

Query cache is supported but not enabled; Perhaps you should set the query_cache_size

The query cache is like a big hash table of Select-Statement -> Result-Table. MySQL checks to see if the same query exists in the cache and returns the cached result without re-running the query. If you enable it, run SHOW VARIABLES to see how well utilised it is (having a huge cache which isn't used is just a waste of memory). My experience of the query cache is that it seems really good in theory, but didn't provide much help in practice.

You have had 11025 queries where a join could not use an index properly; You should enable "log-queries-not-using-indexes" then look for non indexed joins in the slow query log.

Indexes are what makes or breaks a database. Enable the slow query log and use EXPLAIN to see what queries are not using indexes. Fix the slow ones first.

You have a total of 834 tables; You have 528 open tables; Current table_cache hit rate is 8%, while 132% of your table cache is in use; You should probably increase your table_cache

MySQL ISAM databases exist as pairs (I think) of files on disk, but MySQL doesn't keep them open all the time, only ones used recently. The table_cache setting controls how many files it will keep open. This appears to be related to the number of connections, so be careful setting this too high, but it seems you have stacks of memory available so increase this until all tables are cached.

Current Lock Wait ratio = 1 : 529; You may benefit from selective use of InnoDB.

MyISAM tables are great for reading, but whenever you UPDATE, INSERT or DELETE from them MySQL locks the whole table. So if, for example, you have a Page table with a HitCount field which is incremented whenever the page is loaded, the entire Page table is locked and no other connections can read from it. I've seen some particular nasty combinations of read / write queries which would lock tables for minutes or even hours! Effectively killing the site.

InnoDB isn't as fast at reading, but supports more granular write operations (only locking the one record being updated). So is a better fit for tables which are written to more frequently. Converting tables with large numbers of UPDATE, INSERT and DELETE operations to InnoDB may decrease locking and increase performance. Many apps which use MySQL default to InnoDB across the board for this very reason.

~~I think you need to drop the old MyISAM tables and re-create them as InnoDB, so there will be downtime involved.~~

Apparently an ALTER TABLE statement is all you require to change the engine type. Although full table locks will be required, so you'll have some time when you can't run queries.

I don't know if WordPress assumes InnoDB or MyISAM tables. Please check WordPress before altering the tables' engine.

Current max_heap_table_size = 16 M; Current tmp_table_size = 16 M; Of 107071 temp tables, 25% were created on disk; Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables

MySQL requires memory to do sorting (ORDER BY), JOINs and other operations involving large chunks of data, but only up to a certain limit, after that limit the operation spills out onto disk (so that one giant ORDER BY doesn't use up gigabytes of memory which could better be spent doing other things). Increasing max_heap_table_size and tmp_table_size means less operations run on the slow disk and more in fast memory (a discussion about these variables). 64M should be large enough for most cases, and making these too big means you're just wasting memory.

You have 1290 out of 1145245 that take longer than 2.000000 sec. to complete

Use the slow query log to figure out what these slow queries are. Although that ratio (0.11%) is pretty low, there may be a few really slow queries which are causing bigger problems.

Current max_connections = 500; Current threads_connected = 501; You should raise max_connections

Once you have more threads_connected than max_connections new connections are rejected. Increase max_connections (as you already have done).

As you can see, its not always obvious what to do without knowing what sorts of queries WordPress is generating.

Sql-server – LIKE uses index, CHARINDEX does not

How does LIKE '%123456789%' benefit from indexing?

Only a little bit. The query processor can scan the whole nonclustered index looking for matches instead of the entire table (the clustered index). Nonclustered indexes are generally smaller than the table they are built on, so scanning the nonclustered index may be faster.

The downside, is that any columns needed by the query that are not included in the nonclustered index definition must be looked up in the base table, per row.

The optimizer makes a decision between scanning the table (clustered index) and scanning the nonclustered index with lookups, based on cost estimates. The estimated costs depend to a great extent on how many rows the optimizer expects your LIKE or CHARINDEX predicate to select.

Why do the listed articles state that it will not improve performance?

For a LIKE condition that does not start with a wildcard, SQL Server can perform a partial scan of the index instead of scanning the whole thing. For example, LIKE 'A% can be correctly evaluated by testing only index records >= 'A' and < 'B' (the exact boundary values depend on collation).

This sort of query can use the seeking ability of b-tree indexes: we can go straight to the first record >= 'A' using the b-tree, then scan forward in index key order until we reach a record that fails the < 'B' test. Since we only need to apply the LIKE test to a smaller number of rows, performance is generally better.

By contrast, LIKE '%A cannot be turned into a partial scan because we don't know where to start or end; any record could end in 'A', so we cannot improve on scanning the whole index and testing every row individually.

I tried rewriting the query to use CHARINDEX, but performance is still slow. Why does CHARINDEX not benefit from the indexing as it appears the LIKE query does?

The query optimizer has the same choice between scanning the table (clustered index) and scanning the nonclustered index (with lookups) in both cases.

The choice is made between the two based on cost estimation. It so happens that SQL Server may produce a different estimate for the two methods. For the LIKE form of the query, the estimate may be able to use special string statistics to produce a reasonably accurate estimate. The CHARINDEX > 0 form produces an estimate based on a guess.

The different estimates are enough to make the optimizer choose a Clustered Index Scan for CHARINDEX and a NonClustered Index Scan with Lookups for the LIKE. If you force the CHARINDEX query to use the nonclustered index with a hint, you will get the same plan as for LIKE, and performance will be about the same:

SELECT
    [Customer name],
    [Sl_No],
    [Id]
FROM dbo.customer WITH (INDEX (f))
WHERE 
    CHARINDEX('9000413237', [Phone no]) >0;

The number of rows processed at runtime will be the same for both methods, it's just that the LIKE form produces a more accurate estimation in this case, so the query optimizer chooses a better plan.

If you find yourself needing LIKE %thing% searches often, you might want to consider a technique I wrote about in Trigram Wildcard String Search in SQL Server.

Best Answer

Related Solutions

Mysql – Optimising MySQL

Executive Summary

Your Specific Issues

Sql-server – LIKE uses index, CHARINDEX does not

Related Question