I was experimenting with indexes to speed up things, but in case of a join, the index is not improving the query execution time and in some cases it is slowing things down.
The query to create test table and fill it with data is:
CREATE TABLE [dbo].[IndexTestTable](
[id] [int] IDENTITY(1,1) PRIMARY KEY,
[Name] [nvarchar](20) NULL,
[val1] [bigint] NULL,
[val2] [bigint] NULL)
DECLARE @counter INT;
SET @counter = 1;
WHILE @counter < 500000
BEGIN
INSERT INTO IndexTestTable
(
-- id -- this column value is auto-generated
NAME,
val1,
val2
)
VALUES
(
'Name' + CAST((@counter % 100) AS NVARCHAR),
RAND() * 10000,
RAND() * 20000
);
SET @counter = @counter + 1;
END
-- Index in question
CREATE NONCLUSTERED INDEX [IndexA] ON [dbo].[IndexTestTable]
(
[Name] ASC
)
INCLUDE ( [id],
[val1],
[val2])
Now query 1, which is improved (only slightly but the improvement is consistent) is:
SELECT *
FROM IndexTestTable I1
JOIN IndexTestTable I2
ON I1.ID = I2.ID
WHERE I1.Name = 'Name1'
Stats and execution plan without Index (in this case the table uses the default clustered index):
(5000 row(s) affected)
Table 'IndexTestTable'. Scan count 2, logical reads 5580, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 109 ms, elapsed time = 294 ms.
Now with Index enabled:
(5000 row(s) affected)
Table 'IndexTestTable'. Scan count 2, logical reads 2819, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 231 ms.
Now the query which slows down due to the index (the query is meaningless since it is created for testing only):
SELECT I1.Name,
SUM(I1.val1),
SUM(I1.val2),
MIN(I2.Name),
SUM(I2.val1),
SUM(I2.val2)
FROM IndexTestTable I1
JOIN IndexTestTable I2
ON I1.Name = I2.Name
WHERE
I2.Name = 'Name1'
GROUP BY
I1.Name
With clustered index enabled:
(1 row(s) affected)
Table 'IndexTestTable'. Scan count 4, logical reads 60, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 155106, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 17207 ms, elapsed time = 17337 ms.
Now with Index disabled:
(1 row(s) affected)
Table 'IndexTestTable'. Scan count 5, logical reads 8642, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 165212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 17691 ms, elapsed time = 9073 ms.
The questions are:
- Even though the index is suggested by the SQL Server, why does it
slow things down by a significant difference? - What is the Nested Loop join which is taking most of the time and
how to improve its execution time? - Is there something that I am doing wrong or have missed?
- With the default index (on primary key only) why does it take less
time, and with the non clustered index present, for each row in the
joining table, the joined table row should be found quicker, because
join is on Name column on which the index has been created. This is
reflected in the query execution plan and Index Seek cost is less
when IndexA is active, but why still slower? Also what is in the
Nested Loop left outer join that is causing the slowdown?
Using SQL Server 2012
Best Answer
Index suggestions are made by the query optimizer. If it comes across a logical selection from a table which is not well served by an existing index, it may add a "missing index" suggestion to its output. These suggestions are opportunistic; they are not based on a full analysis of the query, and do not take account of wider considerations. At best, they are an indication that more helpful indexing may be possible, and a skilled DBA should take a look.
The other thing to say about missing index suggestions is that they are based on the optimizer's costing model, and the optimizer estimates by how much the suggested index might reduce the estimated cost of the query. The key words here are "model" and "estimates". The query optimizer knows little about your hardware configuration or other system configuration options - its model is largely based on fixed numbers that happen to produce reasonable plan outcomes for most people on most systems most of the time. Aside from issues with the exact cost numbers used, the results are always estimates - and estimates can be wrong.
There is little to be done to improve the performance of the cross join operation itself; nested loops is the only physical implementation possible for a cross join. The table spool on the inner side of the join is an optimization to avoid rescanning the inner side for each outer row. Whether this is a useful performance optimization depends on various factors, but in my tests the query is better off without it. Again, this is a consequence of using a cost model - my CPU and memory system likely has different performance characteristics than yours. There is no specific query hint to avoid the table spool, but there is an undocumented trace flag (8690) that you can use to test execution performance with and without the spool. If this were a real production system problem, the plan without the spool could be forced using a plan guide based on the plan produced with TF 8690 enabled. Using undocumented trace flags in production is not advised because the installation becomes technically unsupported and trace flags can have undesirable side-effects.
The main thing you are missing is that although the plan using the nonclustered index has a lower estimated cost according to the optimizer's model, it has a significant execution-time problem. If you look at the distribution of rows across threads in the plan using the Clustered Index, you will likely see a reasonably good distribution:
In the plan using the Nonclustered Index Seek, the work ends up being performed entirely by one thread:
This is a consequence of the way work is distributed among threads by parallel scan/seek operations. It is not always the case that a parallel scan will distribute work better than an index seek - but it does in this case. More complex plans might include repartitioning exchanges to redistribute work across threads. This plan has no such exchanges, so once rows are assigned to a thread, all related work is performed on that same thread. If you look at the work distribution for the other operators in the execution plan, you will see that all work is performed by the same thread as shown for the index seek.
There are no query hints to affect row distribution among threads, the important thing is to be aware of the possibility and to be able to read enough detail in the execution plan to determine when it is causing a problem.
It should now be clear that the nonclustered index plan is potentially more efficient, as you would expect; it is just poor distribution of work across threads at execution time that accounts for the performance issue.
For the sake of completing the example and illustrating some of the things I have mentioned, one way to get a better work distribution is to use a temporary table to drive parallel execution:
This results in a plan that uses the more efficient index seeks, does not feature a table spool, and distributes work across threads well:
On my system, this plan executes significantly faster than the Clustered Index Scan version.
If you're interested in learning more about the internals of parallel query execution, you might like to watch my PASS Summit 2013 session recording.