SQL Server DELETE Statement – Recompile Not Working

execution-planoptimizationsql serverstatistics

I want to optimize a deletion from my tables.

I have a table-variable input for my usp of following type:

CREATE TYPE [AddressInfoParts] AS TABLE(
(
    [TcpInfoId]     [BIGINT] NULL INDEX IXTcpInfoId CLUSTERED,
    [EmailInfoId]   [BIGINT] NULL INDEX IXEmailInfoId NONCLUSTERED,
    [PhoneInfoId]   [BIGINT] NULL INDEX IXPhoneInfoId NONCLUSTERED
)

Here I have a deletion:

DELETE FROM [doc].[TcpInfo]
WHERE ID in (SELECT aip.TcpInfoId FROM @AddressInfoParts aip)
OPTION(RECOMPILE)

With following plan:

enter image description here

My question is: why it expects for 1 row (in Clustered index seek), when I specify recompile (and expect, that it will recompile a procedure for last table variable size (2000 in this case)).

I always have a table of the same type, but I don't know how can I tell it to SQL Engine. Because it selects non-optimal plan: it uses loops becuase expects only 1 row when we have N rows there, there is 1-1 relationship between variable and those 3 tables. Each column has X unique values and Y NULLS. But in practice one of them is 0 (we have all nulls or we have all unique non-null values).

When I split table on 3 unique tables, statistics works better, but anyway it expects only 1 row, when receive N:
enter image description here

Please, advice.

Best Answer

why it expects for 1 row (in Clustered index seek), when I specify recompile

That is fine.

Somewhat confusingly the estimated rows on the inside of a nested loops join are per execution of the operator.

A seek into a primary key will indeed return 1 row (or 0 if the value doesn't exists at all).

In your case you have 2,000 seeks all returning 1 row and the actual rows reported is 2,000 so the estimated rows per execution was correct. You need to multiply estimated rows by estimated numbers of executions and compare that to the actual rows to see if there is any discrepancy.

Your first plan shows that the estimated number of executions of the inner side will in fact be under estimated though (at 45 not 2,000) as it estimates the stream aggregate on TcpInfoId will return 45 distinct values. This is because there are no column statistics on table variables that tell it that in fact the values are unique. OPTION (RECOMPILE) just allows it to take into account of the number of rows in the table variable it doesn't provide any information on column density.

If you were to change the column definition of TcpInfoId to [TcpInfoId] [BIGINT] NOT NULL PRIMARY KEY or [TcpInfoId] [BIGINT] NULL UNIQUE CLUSTERED (difference being the second one will allow a single NULL) then this will provide uniqueness information that can be used.

Each column has X unique values and Y NULLS. But in practice one of them is 0 (we have all nulls or we have all unique non-null values).

If you are on 2016 you can use a unique filtered index

 CREATE TYPE [AddressInfoParts] AS TABLE
 (
 [TcpInfoId] [BIGINT] NULL INDEX IXTcpInfoId UNIQUE WHERE [TcpInfoId] IS NOT NULL,
 [EmailInfoId] [BIGINT] NULL INDEX IXEmailInfoId NONCLUSTERED,
 [PhoneInfoId] [BIGINT] NULL INDEX IXPhoneInfoId NONCLUSTERED
 )

If you are on a version that doesn't support filtered indexes in table types perhaps you can use a #Temporary table instead (as they can get column statistics created).

Or you can stop trying to use a single table type for these two very different cases.

If I understand your question correctly this is what the second plan represents. The thing you point out as a problem there is not in fact a problem as explained at the beginning of this answer.