Sql-server – How to Optimise Query

performancequery-performancesql server

I have a database structure similar to this,

CREATE TABLE [dbo].[Dispatch](
    [DispatchId] [int] NOT NULL,
    [ContractId] [int] NOT NULL,
    [DispatchDescription] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Dispatch] PRIMARY KEY CLUSTERED 
(
    [DispatchId] ASC,
    [ContractId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[DispatchLink](
    [ContractLink1] [int] NOT NULL,
    [DispatchLink1] [int] NOT NULL,
    [ContractLink2] [int] NOT NULL,
    [DispatchLink2] [int] NOT NULL
) ON [PRIMARY]

GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (1, 1, N'Test')
GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (2, 1, N'Test')
GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (3, 1, N'Test')
GO
INSERT [dbo].[Dispatch] ([DispatchId], [ContractId], [DispatchDescription]) VALUES (4, 1, N'Test')
GO
INSERT [dbo].[DispatchLink] ([ContractLink1], [DispatchLink1], [ContractLink2], [DispatchLink2]) VALUES (1, 1, 1, 2)
GO
INSERT [dbo].[DispatchLink] ([ContractLink1], [DispatchLink1], [ContractLink2], [DispatchLink2]) VALUES (1, 1, 1, 3)
GO
INSERT [dbo].[DispatchLink] ([ContractLink1], [DispatchLink1], [ContractLink2], [DispatchLink2]) VALUES (1, 3, 1, 2)
GO

The point of the DispatchLink table is to link two Dispatch records together. By the way I am using a composite primary key on my dispatch table because of legacy, so I cannot change that without a lot of pain. Also the link table may not be the correct way to do it? But again legacy.

So my question, if I run this query

select * from Dispatch d
inner join DispatchLink dl on d.DispatchId = dl.DispatchLink1 and d.ContractId = dl.ContractLink1
or d.DispatchId = dl.DispatchLink2 and d.ContractId = dl.ContractLink2

I can never get it to do an index seek on the DispatchLink table. It always does a full index scan. That is fine with a few records, but when you have 50000 in that table it scans 50000 records in the index according to the query plan. It is because there are 'ands' and 'ors' in the join clause, but I can't get my head around why SQL can't do a couple of index seeks instead, one for the left side of the 'or', and one for the right side of the 'or'.

I would like an explanation for this, not a suggestion to make the query faster unless that can be done without adjusting the query. The reason is that I am using the above query as a merge replication join filter, so I cannot just add in another type of query unfortunately.

UPDATE: For instance these are the types of indexes I have been adding,

CREATE NONCLUSTERED INDEX IDX1 ON DispatchLink (ContractLink1, DispatchLink1)
CREATE NONCLUSTERED INDEX IDX2 ON DispatchLink (ContractLink2, DispatchLink2)
CREATE NONCLUSTERED INDEX IDX3 ON DispatchLink (ContractLink1, DispatchLink1, ContractLink2, DispatchLink2)

So it uses the indexes, but does an index scan across the whole index, so 50000 records it scans 50000 records in the index.

Best Answer

The optimizer can consider many plan alternatives (including ones with multiple seeks) but for disjunctions (OR predicates) it does not consider plans involving index intersections by default. Given the indexes:

CREATE CLUSTERED INDEX cx 
ON dbo.DispatchLink (DispatchLink1, ContractLink1);

CREATE NONCLUSTERED INDEX nc1 
ON dbo.DispatchLink (DispatchLink2, ContractLink2);

We can force index seeks (assuming SQL Server 2008 or later):

SELECT * 
FROM dbo.Dispatch AS d
INNER JOIN dbo.DispatchLink AS dl WITH (FORCESEEK) ON 
    (d.DispatchId = dl.DispatchLink1 AND d.ContractId = dl.ContractLink1)
    OR (d.DispatchId = dl.DispatchLink2 AND d.ContractId = dl.ContractLink2);

FORCESEEK plan

Using your sample data, the seek plan costs at 0.0332551 units compared with 0.0068057 for the scan plan:

Scan plan

There are all sorts of possible query rewrites and hints we can try. One example of a rewrite to promote an option the optimizer does not consider for the original plan is:

SELECT * 
FROM dbo.Dispatch AS d
CROSS APPLY
(
    SELECT TOP (1) * FROM
    (
        SELECT * FROM dbo.DispatchLink AS dl
        WHERE dl.DispatchLink1 = d.DispatchId
        AND dl.ContractLink1 = d.ContractId
        UNION ALL
        SELECT * FROM dbo.DispatchLink AS dl
        WHERE dl.DispatchLink2 = d.DispatchId
        AND dl.ContractLink2 = d.ContractId
    ) SQ1
) AS F1;

This execution plan does not seek the second index if it finds a match on the first:

APPLY TOP Plan

This may perform very slightly better than the default FORCESEEK plan.

Without adding any new indexes, we can also force a seek into the Dispatch table:

SELECT * 
FROM dbo.DispatchLink AS dl
JOIN dbo.Dispatch AS d WITH (FORCESEEK) ON
    (d.DispatchId = dl.DispatchLink1 AND d.ContractId = dl.ContractLink1)
    OR (d.DispatchId = dl.DispatchLink2 AND d.ContractId = dl.ContractLink2);

Seek 2

This may be better or worse than the first example depending on things like how many rows are in each of the tables. The APPLY + TOP improvement is still possible:

SELECT * 
FROM dbo.DispatchLink AS dl
CROSS APPLY
(
    SELECT TOP (1) * FROM
    (
        SELECT * FROM dbo.Dispatch AS d
        WHERE dl.DispatchLink1 = d.DispatchId
        AND dl.ContractLink1 = d.ContractId
        UNION ALL
        SELECT * FROM dbo.Dispatch AS d
        WHERE dl.DispatchLink2 = d.DispatchId
        AND dl.ContractLink2 = d.ContractId
    ) SQ1
) AS F1;