Sql-server – Help on deciding how to pick Correct Index

indexindex-tuningsql serversql-server-2012

I would like to ask for some help to create correct indexes for this query:

SELECT DISTINCT D.TypeID
FROM dbo.Sizes AS S
WHERE EXISTS (
        SELECT 1
        FROM #PermissionsTable AS PT
        WHERE PT.ProductID = S.ProductID
            AND PT.CountryID = S.CountryID
        );

This query is used in procedure and #PermissionsTable is created in there based on passed in criteria.

I've tried creating various indexes on my table, but I'm always getting only Index Scans, I'd like to get Seeks of course. For instance:

CREATE NONCLUSTERED INDEX idx_Sizes_ProductID_CountryID_TypeID
    ON dbo.Sizes (ProductID, CountryID, TypeID);

CREATE NONCLUSTERED INDEX idx_Sizes_CountryID_ProductID_TypeID
    ON dbo.Sizes (CountryID, ProductID, TypeID);

-- I've added TypeID into INCLUDE part, because I'm not using it in any clause except SELECT statement.
CREATE NONCLUSTERED INDEX idx_Sizes_ProductID_CountryID
    ON dbo.Sizes (ProductID, CountryID) INCLUDE(TypeID);

CREATE NONCLUSTERED INDEX idx_Sizes_CountryID_ProductID
    ON dbo.Sizes (CountryID, ProductID) INCLUDE(TypeID);

And on #PermissionTable I've tried creating both Clustered and Non Clustered indexes on ProductID, CountryID or CountryID, ProductID.

But I'm always ending up with Scans.

Sizes table has hundreds of millions of rows. Permissions table has around 400.000.

At the moment I've added WITH (FORCESEEK) hint next to FROM dbo.Sizes AS S, which forces seek, but I'd like this to be done by SQL Server engine.

Any literatur, tips, anything would be helpful.

Thanks!

Update: added execution plan

Best Answer

First of all, I have to ask: Why are you doing this? If you have a performance issue, pursue it. However, you only mention that you are seeing scans instead of seeks. Scans are not always bad - they can be the most efficient method of pulling large amounts of data since sequential file access is less costly in I/O terms than random access.

The commenters are correct - the query as written will always produce a scan. The outer query will always produce a scan since the query has neither a join nor a sargable condition in the WHERE clause to limit it. The inner query is likely using a scan because of the number of times it is executed and the probability of returning more than 30% of the rows over the course of the query. (30% is roughly the threshold for choosing a scan over a seek.) A join may produce a better plan, but it really depends on the distribution of values in the temporary table.

I can think of a couple of things you might try:

If you're running this query repeatedly to test it, make sure you've added OPTION(RECOMPILE) to the end while you're testing. This will force it to re-evaluate the query instead of using a cached plan. If you don't, the optimizer may not see that an option other than a scan is available.
Try using a CROSS APPLY instead of a join. You can use a subquery as the target of the APPLY; I've used that technique before with stubborn queries with good results. Your query would then resemble SELECT DISTINCT D.TypeID FROM dbo.Sizes S CROSS APPLY (SELECT TOP(1) 1 FROM #PermissionsTable PT WHERE PT.ProductID = S.ProductID AND PT.CountryID = S.CountryID) P. Keep in mind that the APPLY will still be run as many times as you have rows in Sizes, so even though you're using a seek, it might still produce a poor plan.

If neither of those has an effect, you may be able to improve performance by changing the way you think about the query. How many distinct ProductID/CountryID combinations are there in #PermissionsTable? Instead of searching on all 400,000 rows, can you select only those distinct combinations into another temp table, then join that to Sizes? Can you select only the distinct combinations of ProductID/CountryID/TypeID from Sizes? Can you reverse the query so that Sizes is in the inner query? The goal of all of these would be to reduce the number of rows that the query must retrieve.

Related Solutions

Sql-server – Index Strategies on Text or NVARCHAR(MAX) Fields

Why do you think anything but a scan should be used to pull back all the data? A full-text index won't really help - that helps you search those columns, but if you're just returning all the data (for any variety of WHERE clauses) then there's no shortcut to reading all of the data. Can I ask why a to_addr, which is presumably limited to ~320 characters by the SMTP standards (depending on which standard you believe), contains data > 4000 characters?

A lot of people think that a scan is bad. If you need to return a large amount of data, then often a clustered index scan will be used. Your where clause may lead to seeks being used to locate the rows to return, but a seek isn't going to work where the data in that column is that large. Are you just seeing a scan in the execution plan and assuming that must be the problem?

Sql-server – Using sys.dm_db_index_usage_stats for unhelpful or unused indexes

Just throwing this out there, because I remember the bad old days of having a different script for every index ailment you read about on the internet.

I co-author a free stored procedure called sp_BlitzIndex that will tell you about a whole bunch of stuff going on with your indexes all at once.

Some examples:

Aggressively locked indexes
Duplicates (based on keys)
Borderline duplicates (based on first key column)
Unused (with differentiation based on number of writes
High value missing indexes
HEAPs
And more!

The simplest example run is probably like this. Just change the database name.

EXEC sp_BlitzIndex @DatabaseName = 'StackOverflow', @Mode = 4

There are a ton more ways to run it, just check out the docs at the GitHub link.

Hope this helps!

Best Answer

Related Solutions

Sql-server – Index Strategies on Text or NVARCHAR(MAX) Fields

Sql-server – Using sys.dm_db_index_usage_stats for unhelpful or unused indexes

Related Question