I think I have found the answer for this already but I am hoping to get some additional perspective.
Assume we are JOIN
ing two tables together on a shared column and each table then has a different column
we are going to do a Constant
search on. When we build an index
to support the query, for each table do we want to put the JOIN
ing column
first, or the Constant
column
first? I am thinking now it is the Constant
column
first. When I look at a query plan for a different query that prompted this question, it appears it tries to create a subset of each table and then JOIN
them together. Instead of JOIN
ing the two tables togeather and filtering down from there.
EX: Joining Shipments to Customers where Shipment is Shipped and Customer is Active
SELECT [Columns]
FROM Shipment S
INNER JOIN Customer C
ON S.CustomerID = C.CustomerID
WHERE S.IsShipped = 1
AND C.IsActive = 1
I am thinking the two best indexes to use are below. Because the Query Optimizer
would prefer to scan the Constant
first then JOIN
on the 2nd column
instead of JOIN
ing the two tables
together and filtering on the constant
after that.
CREATE NONCLUSTERED INDEX [IX_IsActive-CustomerID] ON [dbo].[Customer]
(
[IsActive] ASC,
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_IsShipped-CustomerID] ON [dbo].[Shipment]
(
[IsShipped] ASC,
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Instead of:
CREATE NONCLUSTERED INDEX [IX_CustomerID-IsActive] ON [dbo].[Customer]
(
[CustomerID] ASC,
[IsActive] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_CustomerID-IsShipped] ON [dbo].[Shipment]
(
[CustomerID] ASC,
[IsShipped] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Is that right?
Best Answer
Assuming there are no other queries of concern, the general answer is that you want more selective criteria in the leading columns of the index (note that selectivity here is in relation to the predicate rather than a measure of how unique the particular values in the column are). In general, you want the optimizer to eliminate as many rows as possible as quickly as possible.
Assuming that
CustomerID
inShipment
isNOT NULL
and has a foreign key toCustomer
, meaning that theinner join
is guaranteed not to eliminate any rows fromSupplier
, thens.IsShipped = 1
is the only selective predicate and it would make sense for it to be the leading column. If, on the other hand, theinner join
was more selective (imagine that someone moved rows fromCustomer
toCustomer_Archive
periodically andShipment
could join to either of those tables), it would make sense to haveCustomerID
as the leading column.