Sql-server – OrderBy int slow for large table join

execution-planperformancequery-performancesql server

I have roughly 1.5 million records. This query works fine for smaller data, but it gets a second slower for every few hundred thousand records. For 1.5 million, the query is noticeably struggling.

I mocked up a whole new Code First project to verify the problem is independent of my implementation, and to provide this example query:

    SELECT TOP 1 Widgets.*
    FROM [WidgetSandbox].[dbo].[Widgets] Widgets
    INNER JOIN [WidgetSandbox].[dbo].[Status] Statuses ON Widgets.StatusId = Statuses.Id
    INNER JOIN [WidgetSandbox].[dbo].[Colors] Colors ON Widgets.ColorCode = Colors.ColorCode
    INNER JOIN [WidgetSandbox].[dbo].[Sizes] Sizes ON Widgets.SizeId = Sizes.Id
    WHERE Statuses.Name = 'Available'
    AND Colors.Name = 'Red'
    ORDER BY Sizes.DiameterInches

DiameterInches is an int, as a metaphor for "PriorityLevel" in my actual code.

If I comment out ORDER BY Sizes.DiameterInches, it returns immediately, but if I want to find "the smallest available red widget", then it crawls.

Is there a better way?

enter image description here

Execution plan XML: https://gist.github.com/RobertBaldini/57c8b61d135cc5c84c38b2da243611ad

DDL: https://gist.github.com/RobertBaldini/3740c7bb85eea47d7fe63cb8602ac2d6

Repo (data loader takes several minutes): https://github.com/RobertBaldini/WidgetSandbox

Best Answer

You should retype the various Name columns from nvarchar(max) to nvarchar([right size]). It's unlikely that names will be up to 2GB in length, and making them max sized prevents them being used as a key in an index. A good general rule of thumb is to avoid large object data types wherever possible.

You'll probably need to make that change to the EF code, but in T-SQL:

-- Guessing at 100 characters maximum
ALTER TABLE dbo.[Status]
ALTER COLUMN Name nvarchar(100) NOT NULL;

ALTER TABLE dbo.Colors
ALTER COLUMN Name nvarchar(100) NOT NULL;

ALTER TABLE dbo.Sizes
ALTER COLUMN Name nvarchar(100) NOT NULL;

ALTER TABLE dbo.Widgets
ALTER COLUMN Name nvarchar(100) NOT NULL;

I have also changed the column definition to NOT NULL there.

Anyway, given indexes:

CREATE NONCLUSTERED INDEX IX_dbo_Sizes__DiameterInches
ON dbo.Sizes (DiameterInches);

CREATE NONCLUSTERED INDEX IX_dbo_Widgets__SizeId_ColorCode_StatusId__Name
ON dbo.Widgets (SizeId, ColorCode, StatusId)
INCLUDE (Name);

...you should end up with a plan that avoids lookups and sorts, something like:

Expected plan shape

With the small number of rows in Status and Colors, it is probably not worth indexing their Name columns right now, but that could change over time. In any case, if the name columns should be unique, you should constrain them to be so using a unique constraint or unique nonclustered index, for example:

CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Colors__Name
ON dbo.Colors (Name);

CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Status__Name
ON dbo.[Status] (Name);

CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Sizes__Name
ON dbo.Sizes (Name);

CREATE UNIQUE NONCLUSTERED INDEX UQ_dbo_Widgets__Name
ON dbo.Widgets (Name);

Second approach

With the same indexes, if you are able to change the SQL, you could also separate out the Color and Status lookups:

DECLARE 
    @StatusId integer,
    @ColorCode nvarchar(6);

SELECT @StatusId = ST.Id 
FROM dbo.[Status] AS ST 
WHERE ST.Name = N'Available';

SELECT @ColorCode = C.ColorCode 
FROM dbo.Colors AS C 
WHERE C.Name = N'Red';

SELECT TOP (1) 
    W.*
FROM dbo.Widgets AS W
JOIN dbo.Sizes AS S
    ON W.SizeId = S.Id
WHERE
    W.StatusId = @StatusId
    AND W.ColorCode = @ColorCode
ORDER BY
    S.DiameterInches;

This makes for a simpler set of operations that also happen to make life easier for the optimizer:

Plan with lookups separate

Note also that Unicode string literals should be prefixed with N for correct data typing.

Oh, and you should also patch your instance. It is currently SQL Server 2008 Service Pack 1 - Service Pack 4 has been available for some time now.