Sql-server – Help on deciding how to pick Correct Index

indexindex-tuningsql serversql-server-2012

I would like to ask for some help to create correct indexes for this query:

SELECT DISTINCT D.TypeID
FROM dbo.Sizes AS S
WHERE EXISTS (
        SELECT 1
        FROM #PermissionsTable AS PT
        WHERE PT.ProductID = S.ProductID
            AND PT.CountryID = S.CountryID
        );

This query is used in procedure and #PermissionsTable is created in there based on passed in criteria.

I've tried creating various indexes on my table, but I'm always getting only Index Scans, I'd like to get Seeks of course. For instance:

CREATE NONCLUSTERED INDEX idx_Sizes_ProductID_CountryID_TypeID
    ON dbo.Sizes (ProductID, CountryID, TypeID);

CREATE NONCLUSTERED INDEX idx_Sizes_CountryID_ProductID_TypeID
    ON dbo.Sizes (CountryID, ProductID, TypeID);

-- I've added TypeID into INCLUDE part, because I'm not using it in any clause except SELECT statement.
CREATE NONCLUSTERED INDEX idx_Sizes_ProductID_CountryID
    ON dbo.Sizes (ProductID, CountryID) INCLUDE(TypeID);

CREATE NONCLUSTERED INDEX idx_Sizes_CountryID_ProductID
    ON dbo.Sizes (CountryID, ProductID) INCLUDE(TypeID);

And on #PermissionTable I've tried creating both Clustered and Non Clustered indexes on ProductID, CountryID or CountryID, ProductID.

But I'm always ending up with Scans.

Sizes table has hundreds of millions of rows. Permissions table has around 400.000.

At the moment I've added WITH (FORCESEEK) hint next to FROM dbo.Sizes AS S, which forces seek, but I'd like this to be done by SQL Server engine.

Any literatur, tips, anything would be helpful.

Thanks!

Update: added execution plan
Execution plan

Best Answer

First of all, I have to ask: Why are you doing this? If you have a performance issue, pursue it. However, you only mention that you are seeing scans instead of seeks. Scans are not always bad - they can be the most efficient method of pulling large amounts of data since sequential file access is less costly in I/O terms than random access.

The commenters are correct - the query as written will always produce a scan. The outer query will always produce a scan since the query has neither a join nor a sargable condition in the WHERE clause to limit it. The inner query is likely using a scan because of the number of times it is executed and the probability of returning more than 30% of the rows over the course of the query. (30% is roughly the threshold for choosing a scan over a seek.) A join may produce a better plan, but it really depends on the distribution of values in the temporary table.

I can think of a couple of things you might try:

  • If you're running this query repeatedly to test it, make sure you've added OPTION(RECOMPILE) to the end while you're testing. This will force it to re-evaluate the query instead of using a cached plan. If you don't, the optimizer may not see that an option other than a scan is available.
  • Try using a CROSS APPLY instead of a join. You can use a subquery as the target of the APPLY; I've used that technique before with stubborn queries with good results. Your query would then resemble SELECT DISTINCT D.TypeID FROM dbo.Sizes S CROSS APPLY (SELECT TOP(1) 1 FROM #PermissionsTable PT WHERE PT.ProductID = S.ProductID AND PT.CountryID = S.CountryID) P. Keep in mind that the APPLY will still be run as many times as you have rows in Sizes, so even though you're using a seek, it might still produce a poor plan.

If neither of those has an effect, you may be able to improve performance by changing the way you think about the query. How many distinct ProductID/CountryID combinations are there in #PermissionsTable? Instead of searching on all 400,000 rows, can you select only those distinct combinations into another temp table, then join that to Sizes? Can you select only the distinct combinations of ProductID/CountryID/TypeID from Sizes? Can you reverse the query so that Sizes is in the inner query? The goal of all of these would be to reduce the number of rows that the query must retrieve.