Effect of Primary Key on Execution Plans for Table Variables in SQL Server

optimizationsql servertable variable

Having read a great deal about the differences between temporary tables and table variables in SQL Server, I am experimenting with switching from mostly using temporary tables to mostly using table variables. (They seem to be a better fit for the types of queries I usually work with.)

In these queries, the tables hold unique identifiers that drive the lookup process. It's been my habit, when working with temporary tables, to include a PRIMARY KEY constraint so that the query optimizer is aware that it won't see any duplicates. However, given that the optimizer (in most circumstances, and for my queries) assumes that a table variable only holds a single row*, which is unique by definition, is the query optimizer going to make choices any differently if there's a PRIMARY KEY constraint?

* Technically, it assumes there are no rows, but replaces the zero with a one. (Because the zero interacts very poorly with the rest of the estimation process.) But it also depends on whether the table variable is populated or not when the query is compiled. There is some background information here: What's the difference between a temp table and table variable in SQL Server?.

I'm currently using SQL Server 2014, but I would be curious if the behavior changes in newer versions.


As has been pointed out, a PRIMARY KEY constraint comes with a clustered index that gives the query optimizer more choices on how to get data out of the table variable. I was aware of this and thinking about the rest of the query plan. But after attempting to clarify my question, I've decided that the question I was attempting to ask was too broad and probably particular to my extreme situation. (Nothing but navigational-type queries into half-a-trillion-row tables with an expectation of sub-second performance.) So I am going to leave my question as-is.

Best Answer

Since declaring a PRIMARY KEY on a table variable implicitly creates an index for the key columns (and in fact is the only way to index a table variable prior to SQL Server 2014), its presence will definitely have an effect on the resulting query plans. The optimizer will make use of that primary key index where appropriate. You can see that in action by running this short script with the execution plan enabled - the table scan will change to a clustered index seek:

--No primary key/index
DECLARE @t1 TABLE (
    id int NOT NULL,
    data varchar(50) NOT NULL
)

INSERT INTO @t1 VALUES
(1, 'aaaaa'),
(2, 'bbbbb'),
(3, 'ccccc'),
(4, 'ddddd'),
(5, 'eeeee')

SELECT * FROM @t1 WHERE id = 4

--With primary key/index
DECLARE @t2 TABLE (
    id int NOT NULL PRIMARY KEY CLUSTERED,
    data varchar(50) NOT NULL
)

INSERT INTO @t2 VALUES
(1, 'aaaaa'),
(2, 'bbbbb'),
(3, 'ccccc'),
(4, 'ddddd'),
(5, 'eeeee')

SELECT * FROM @t2 WHERE id = 4

Now, as for whether declaring a PRIMARY KEY instead of a plain CLUSTERED INDEX (which 2014 lets you do) will result in different query plans? That I can't say authoritatively. This contrived test was still a clustered index seek:

DECLARE @t3 TABLE (
    id int NOT NULL,
    data varchar(50) NOT NULL,
    INDEX IX1 CLUSTERED (id)
)

INSERT INTO @t3 VALUES
(1, 'aaaaa'),
(2, 'bbbbb'),
(3, 'ccccc'),
(4, 'ddddd'),
(5, 'eeeee')

SELECT * FROM @t3 WHERE id = 4

I suspect things get a little more iffy when using nonclustered indexes on table variables, where the optimizer needs to estimate the cost of potential RID lookups and weigh them against a table scan.