I have the following tables in my database.
LastNames Table
CREATE TABLE [dbo].[LastNames](
[LastNameID] [int] IDENTITY(1,1) NOT NULL,
[LastName] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CIX_LastNames_LastName] ON [dbo].[LastNames]
(
[LastName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
FirstNames Table
CREATE TABLE [dbo].[FirstNames](
[FirstNameID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CIX_FirstNames_FirstName] ON [dbo].[FirstNames]
(
[FirstName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
PersonsAnon Table
CREATE TABLE [dbo].[PersonsAnon](
[PersonID] [int] IDENTITY(1,1) NOT NULL,
[LastNameID] [int] NOT NULL,
[FirstNameID] [int] NOT NULL,
[Info1] [bit] NULL,
[Info2] [char](1) NULL,
[Info3] [nchar](50) NULL,
[AdressID] [int] NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CIX_PersonsAnon_PersonID_LastNameID_FirstNameID] ON [dbo].[PersonsAnon]
(
[PersonID] ASC,
[LastNameID] ASC,
[FirstNameID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [NIX_PersonsAnon_LastNameID_FirstNameID] ON [dbo].[PersonsAnon]
(
[LastNameID] ASC,
[FirstNameID] ASC
)
INCLUDE([PersonID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Data Generation
For those of you who have an AdventureWorks2014 Database lying around, I used the following statements to populate the tables:
INSERT INTO dbo.LastNames (LastName) SELECT DISTINCT app.LastName FROM AdventureWorks2016.Person.Person AS app
INSERT INTO dbo.FirstNames (FirstName) SELECT DISTINCT app.FirstName FROM AdventureWorks2016.Person.Person AS app
INSERT INTO dbo.PersonsAnon (LastNameID, FirstNameID)
SELECT ln.LastNameID, fn.FirstNameID FROM LastNames ln CROSS APPLY FirstNames fn
The Statement
The following statement was used to see what kind of execution plan would be created:
SELECT pa.PersonID,
fn.FirstName,
ln.LastName,
pa.Info2
FROM PersonsAnon AS pa
JOIN LastNames AS ln
ON pa.LastNameID = ln.LastNameID
JOIN FirstNames AS fn
ON pa.FirstNameID = fn.FirstNameID
WHERE ln.LastName LIKE 'Pet%'
AND fn.FirstName LIKE 'John%'
ORDER BY
ln.LastName,
fn.FirstName;
The Query Execution Plan
Questions
Why is the JOIN performed between LastNames and FirstNames even though there is no JOIN predicate?
What is causing the QO to JOIN the two tables?
Graphical Situation
Here is what it looks like:
And the details show:
Best Answer
The optimizer decided your query was a selective star query on fact and dimension tables.
The giveaway is the presence of
StarJoinInfo
attributes on the plan's join operators (except the cartesian product, unfortunately). From my article, StarJoinInfo in Execution Plans:One of the strategies available is "Cartesian Product plus Multi-Column Index Lookup". The idea is to apply a separate predicate to each of the dimension tables, take the cartesian product of the results, and use that to seek into both keys of the multi-column index on the fact table. The query plan then performs a lookup (RID or Key) into the fact table using row identifiers from the prior operations.
In other words:
If you follow the logic through, you'll see this strategy returns the results your query specification asks for.