Extra NESTED LOOP / INNER JOIN causing NO JOIN PREDICATE warning

execution-planquerysql serversql-server-2016

I have the following tables in my database.

LastNames Table

CREATE TABLE [dbo].[LastNames](
    [LastNameID] [int] IDENTITY(1,1) NOT NULL,
    [LastName] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CIX_LastNames_LastName] ON [dbo].[LastNames]
(
    [LastName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

FirstNames Table

CREATE TABLE [dbo].[FirstNames](
    [FirstNameID] [int] IDENTITY(1,1) NOT NULL,
    [FirstName] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CIX_FirstNames_FirstName] ON [dbo].[FirstNames]
(
    [FirstName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

PersonsAnon Table

CREATE TABLE [dbo].[PersonsAnon](
    [PersonID] [int] IDENTITY(1,1) NOT NULL,
    [LastNameID] [int] NOT NULL,
    [FirstNameID] [int] NOT NULL,
    [Info1] [bit] NULL,
    [Info2] [char](1) NULL,
    [Info3] [nchar](50) NULL,
    [AdressID] [int] NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [CIX_PersonsAnon_PersonID_LastNameID_FirstNameID] ON [dbo].[PersonsAnon]
(
    [PersonID] ASC,
    [LastNameID] ASC,
    [FirstNameID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [NIX_PersonsAnon_LastNameID_FirstNameID] ON [dbo].[PersonsAnon]
(
    [LastNameID] ASC,
    [FirstNameID] ASC
)
INCLUDE([PersonID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Data Generation

For those of you who have an AdventureWorks2014 Database lying around, I used the following statements to populate the tables:

INSERT INTO dbo.LastNames (LastName) SELECT DISTINCT app.LastName FROM AdventureWorks2016.Person.Person AS app

INSERT INTO dbo.FirstNames (FirstName) SELECT DISTINCT app.FirstName FROM AdventureWorks2016.Person.Person AS app

INSERT INTO dbo.PersonsAnon (LastNameID, FirstNameID)
SELECT ln.LastNameID, fn.FirstNameID FROM LastNames ln CROSS APPLY FirstNames fn

The Statement

The following statement was used to see what kind of execution plan would be created:

SELECT pa.PersonID,
       fn.FirstName,
       ln.LastName,
       pa.Info2
FROM   PersonsAnon AS pa
       JOIN LastNames AS ln
            ON  pa.LastNameID = ln.LastNameID
       JOIN FirstNames AS fn
            ON  pa.FirstNameID = fn.FirstNameID
WHERE  ln.LastName LIKE 'Pet%'
       AND fn.FirstName LIKE 'John%'
ORDER BY
       ln.LastName,
       fn.FirstName;

The Query Execution Plan

PasteThePlan

Questions

Why is the JOIN performed between LastNames and FirstNames even though there is no JOIN predicate?
What is causing the QO to JOIN the two tables?

Graphical Situation

Here is what it looks like:

And the details show:

Articles already consulted

Best Answer

The optimizer decided your query was a selective star query on fact and dimension tables.

The giveaway is the presence of StarJoinInfo attributes on the plan's join operators (except the cartesian product, unfortunately). From my article, StarJoinInfo in Execution Plans:

The presence of StarJoinInfo indicates that SQL Server applied one of a set of optimizations targeted at selective star-schema queries. These optimizations are available from SQL Server 2005, in all editions (not just Enterprise). Note that 'selective' here refers to the number of rows fetched from the fact table. The combination of dimensional predicates in a query may still be selective even where its individual predicates qualify a large number of rows.

One of the strategies available is "Cartesian Product plus Multi-Column Index Lookup". The idea is to apply a separate predicate to each of the dimension tables, take the cartesian product of the results, and use that to seek into both keys of the multi-column index on the fact table. The query plan then performs a lookup (RID or Key) into the fact table using row identifiers from the prior operations.

In other words:

Find first names beginning with "John" and the associated ID
Find last names beginning with "Pet" and the associated ID
Produce the cartesian product of results from steps 1 & 2
Seek into PersonsAnon using the multi-column index on IDs for the results from step 3
Perform a lookup to retrieve 'Info2'

If you follow the logic through, you'll see this strategy returns the results your query specification asks for.

Although intended to speed up selective star (and snowflake) schema queries in data warehousing, the optimizer may apply these techniques wherever it detects a suitable set of tables and joins. The heuristics used to detect star queries are quite broad, so you may encounter plan shapes with StarJoinInfo structures in just about any type of database.