Effect of Primary Key on Execution Plans for Table Variables in SQL Server

optimizationsql servertable variable

Having read a great deal about the differences between temporary tables and table variables in SQL Server, I am experimenting with switching from mostly using temporary tables to mostly using table variables. (They seem to be a better fit for the types of queries I usually work with.)

In these queries, the tables hold unique identifiers that drive the lookup process. It's been my habit, when working with temporary tables, to include a PRIMARY KEY constraint so that the query optimizer is aware that it won't see any duplicates. However, given that the optimizer (in most circumstances, and for my queries) assumes that a table variable only holds a single row*, which is unique by definition, is the query optimizer going to make choices any differently if there's a PRIMARY KEY constraint?

* Technically, it assumes there are no rows, but replaces the zero with a one. (Because the zero interacts very poorly with the rest of the estimation process.) But it also depends on whether the table variable is populated or not when the query is compiled. There is some background information here: What's the difference between a temp table and table variable in SQL Server?.

I'm currently using SQL Server 2014, but I would be curious if the behavior changes in newer versions.

As has been pointed out, a PRIMARY KEY constraint comes with a clustered index that gives the query optimizer more choices on how to get data out of the table variable. I was aware of this and thinking about the rest of the query plan. But after attempting to clarify my question, I've decided that the question I was attempting to ask was too broad and probably particular to my extreme situation. (Nothing but navigational-type queries into half-a-trillion-row tables with an expectation of sub-second performance.) So I am going to leave my question as-is.

Best Answer

Since declaring a PRIMARY KEY on a table variable implicitly creates an index for the key columns (and in fact is the only way to index a table variable prior to SQL Server 2014), its presence will definitely have an effect on the resulting query plans. The optimizer will make use of that primary key index where appropriate. You can see that in action by running this short script with the execution plan enabled - the table scan will change to a clustered index seek:

--No primary key/index
DECLARE @t1 TABLE (
    id int NOT NULL,
    data varchar(50) NOT NULL
)

INSERT INTO @t1 VALUES
(1, 'aaaaa'),
(2, 'bbbbb'),
(3, 'ccccc'),
(4, 'ddddd'),
(5, 'eeeee')

SELECT * FROM @t1 WHERE id = 4

--With primary key/index
DECLARE @t2 TABLE (
    id int NOT NULL PRIMARY KEY CLUSTERED,
    data varchar(50) NOT NULL
)

INSERT INTO @t2 VALUES
(1, 'aaaaa'),
(2, 'bbbbb'),
(3, 'ccccc'),
(4, 'ddddd'),
(5, 'eeeee')

SELECT * FROM @t2 WHERE id = 4

Now, as for whether declaring a PRIMARY KEY instead of a plain CLUSTERED INDEX (which 2014 lets you do) will result in different query plans? That I can't say authoritatively. This contrived test was still a clustered index seek:

DECLARE @t3 TABLE (
    id int NOT NULL,
    data varchar(50) NOT NULL,
    INDEX IX1 CLUSTERED (id)
)

INSERT INTO @t3 VALUES
(1, 'aaaaa'),
(2, 'bbbbb'),
(3, 'ccccc'),
(4, 'ddddd'),
(5, 'eeeee')

SELECT * FROM @t3 WHERE id = 4

I suspect things get a little more iffy when using nonclustered indexes on table variables, where the optimizer needs to estimate the cost of potential RID lookups and weigh them against a table scan.

Related Solutions

Sql-server – PRIMARY KEY CONSTRAINT fundamentals

The distinction is mainly historical. The relational model was developed in part in response to the way IMS handled data.

The presently released version of IMS provides the user with a choice for each file: a choice between no indexing at all (the hierarchic sequential organization) or indexing on the primary key only . . .

Source: A Relational Model of Data for Large Shared Data Banks

Codd retained that term. (Same source.)

Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each element (n-tuple) of that relation. Such a domain (or combination) is called a primary key.

It's not unusual for a relation to have more than one "combination of domains" that uniquely identify each tuple. The relational model offers no theoretical basis for elevating the status of one of them to "primary". As far as the relational model is concerned, all candidate keys are created equal.

As far as standard SQL is concerned, any column or combination of columns that's declared UNIQUE can be the target of a foreign key constraint. In the past, dbms platforms differed slightly in how they handled NULL in a column declared UNIQUE. That's part of the reason old-school database designers lean toward NOT NULL UNIQUE declarations; the other part is that NOT NULL UNIQUE is functionally equivalent to PRIMARY KEY.

There is a semantic difference between a UNIQUE index and a UNIQUE constraint. I prefer to see constraints expressed as constraints.

SQL Server – Moving Primary Key Constraint from One Index to Another

The following script illustrates an efficient way to convert the existing nonclustered primary key to clustered, and to rename it:

-- How the table looks now
CREATE TABLE dbo.Example
(
    pk integer NOT NULL,
    some_data integer NOT NULL,

    CONSTRAINT PK_UnusualName
        PRIMARY KEY NONCLUSTERED (pk)
);

-- Some data
INSERT dbo.Example (pk, some_data)
VALUES (1, 100), (2, 200), (3, 300);

-- Change the nonclustered PK to clustered
CREATE UNIQUE CLUSTERED INDEX PK_UnusualName
ON dbo.Example (pk)
WITH (DROP_EXISTING = ON);

-- Rename
EXECUTE sys.sp_rename 
    @objname = N'dbo.Example.PK_UnusualName',
    @newname = N'PK__dbo_Example_pk',
    @objtype = 'INDEX';

-- Tidy up
DROP TABLE dbo.Example;

Best Answer

Related Solutions

Sql-server – PRIMARY KEY CONSTRAINT fundamentals

SQL Server – Moving Primary Key Constraint from One Index to Another

Related Question