SQL Server – Does Query Optimizer Prefer Constants Over Columns?

indexperformancequery-performancesql server

I think I have found the answer for this already but I am hoping to get some additional perspective.

Assume we are JOINing two tables together on a shared column and each table then has a different column we are going to do a Constant search on. When we build an index to support the query, for each table do we want to put the JOINing column first, or the Constant column first? I am thinking now it is the Constant column first. When I look at a query plan for a different query that prompted this question, it appears it tries to create a subset of each table and then JOIN them together. Instead of JOINing the two tables togeather and filtering down from there.

EX: Joining Shipments to Customers where Shipment is Shipped and Customer is Active

SELECT [Columns]
FROM Shipment S
   INNER JOIN Customer C
      ON S.CustomerID = C.CustomerID
WHERE S.IsShipped = 1
AND C.IsActive = 1

I am thinking the two best indexes to use are below. Because the Query Optimizer would prefer to scan the Constant first then JOIN on the 2nd column instead of JOINing the two tables together and filtering on the constant after that.

CREATE NONCLUSTERED INDEX [IX_IsActive-CustomerID] ON [dbo].[Customer]
(
    [IsActive] ASC,
    [CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_IsShipped-CustomerID] ON [dbo].[Shipment]
(
    [IsShipped] ASC,
    [CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Instead of:

CREATE NONCLUSTERED INDEX [IX_CustomerID-IsActive] ON [dbo].[Customer]
(
    [CustomerID] ASC,
    [IsActive] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_CustomerID-IsShipped] ON [dbo].[Shipment]
(
    [CustomerID] ASC,
    [IsShipped] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Is that right?

Best Answer

Assuming there are no other queries of concern, the general answer is that you want more selective criteria in the leading columns of the index (note that selectivity here is in relation to the predicate rather than a measure of how unique the particular values in the column are). In general, you want the optimizer to eliminate as many rows as possible as quickly as possible.

Assuming that CustomerID in Shipment is NOT NULL and has a foreign key to Customer, meaning that the inner join is guaranteed not to eliminate any rows from Supplier, then s.IsShipped = 1 is the only selective predicate and it would make sense for it to be the leading column. If, on the other hand, the inner join was more selective (imagine that someone moved rows from Customer to Customer_Archive periodically and Shipment could join to either of those tables), it would make sense to have CustomerID as the leading column.