For a parameterised query It can't just do two seeks on
WHERE A=@P1 AND B=@P2 AND C=@P3 AND D=@P5
and
WHERE A=@P1 AND B=@P2 AND C=@P4 AND D=@P5
Because if @P3 = @P4
that would incorrectly bring back duplicate rows. So it would need an operator that removed duplicates from these first.
From a quick test this end it appears to be dependant upon the size of the table whether or not you get that. In the test below 245
/246
rows is the cut off point between plans (this was also the cut off point between the index fitting all on one page and it becoming 2 leaf pages and a root page).
CREATE TABLE T(A INT,B INT,C INT,D INT)
INSERT INTO T
SELECT TOP (245) 1,2,3,5
FROM master..spt_values v1
CREATE CLUSTERED INDEX IX ON T(A, B, C, D)
SELECT index_level,page_count, record_count
FROM sys.dm_db_index_physical_stats(db_id(),object_id('T'),1,NULL, 'DETAILED')
DECLARE @C1 INT = 3,
@C2 INT = 4
SELECT * FROM T WHERE A=1 AND B=2 AND (C=@C1 OR C=@C2) AND D=5
DROP TABLE T
1 Pages / 245 rows
This plan has a seek on A=1 AND B=2
with a residual predicate on (C=@C1 OR C=@C2) AND D=5
2 leaf Pages / 246 rows
In the second plan the extra operators are responsible for removing any duplicates from @C1,@C2
first before performing the seek(s).
The seek in the second plan is actually a range seek between A=1 AND B=2 AND C > Expr1010
and A=1 AND B=2 AND C < Expr1011
with a residual predicate on D=5
. It still isn't an equality seek on all 4 columns. More information about the additional plan operators can be found here.
Adding OPTION (RECOMPILE)
does allow it to inspect the parameter values for duplicates at compile time and produces a plan with two equality seeks.
You could also achieve that with
;WITH CTE
AS (SELECT DISTINCT ( C )
FROM (VALUES (@C1),
(@C2)) V(C))
SELECT CA.*
FROM CTE
CROSS APPLY (SELECT *
FROM T
WHERE A=1 AND B=2 AND D=5 AND C = CTE.C) CA
But actually in this test case it would likely be counter productive as having two seeks into the single page index rather than one increases the logical IO.
GETDATE()
is not guaranteed to be unique, no. Especially if it is a datetime, where milliseconds are rounded up or down, and of course when not all data is coming from the same user, you are almost guaranteed to have collisions.
Of course, a clustered index does not need to be unique, since SQL Server will make it so if it isn't (but only when it needs to). If you need to identify a specific row yourself (as opposed to just the uniquifier for SQL Server's internal usage), and there are no other candidate key columns (this is possible with things like event logging tables), you could add a non-clustered primary key that is, say, an IDENTITY column. Or if you really want web scale - and care more about insert performance than storage or any subsequent use of the data - you could use a uniqueidentifier column populated with NEWID().
Example
Let's take a look at an example, and see the differences.
USE tempdb;
GO
-- rely on uniqifier
CREATE TABLE dbo.Test1
(
g DATETIME
);
CREATE CLUSTERED INDEX x ON dbo.Test1(g);
-- use an IDENTITY column
CREATE TABLE dbo.Test2
(
i INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
g DATETIME
);
CREATE CLUSTERED INDEX x ON dbo.Test2(g);
-- use a GUID
CREATE TABLE dbo.Test3
(
n UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID()
PRIMARY KEY NONCLUSTERED,
g DATETIME
);
CREATE CLUSTERED INDEX x ON dbo.Test3(g);
GO
Insert speed
I populated all three tables with about 500,000 rows, using the following script:
SET NOCOUNT ON;
GO
SELECT SYSDATETIME();
GO
INSERT dbo.Test1(g)
SELECT DATEADD(SECOND, ABS([object_id])/1000, GETDATE())
FROM sys.all_columns;
GO 100
SELECT SYSDATETIME();
GO
INSERT dbo.Test2(g)
SELECT DATEADD(SECOND, ABS([object_id])/1000,
GETDATE()) FROM sys.all_columns;
GO 100
SELECT SYSDATETIME();
GO
INSERT dbo.Test3(g)
SELECT DATEADD(SECOND, ABS([object_id])/1000,
GETDATE()) FROM sys.all_columns;
GO 100
SELECT SYSDATETIME();
Results:
Uniquifier: 2.26 seconds
IDENTITY: 3.89 seconds
GUID: 5.06 seconds
Scan Speed
SELECT SYSDATETIME();
DBCC DROPCLEANBUFFERS;
SELECT * FROM dbo.Test1;
SELECT SYSDATETIME();
DBCC DROPCLEANBUFFERS;
SELECT * FROM dbo.Test2;
SELECT SYSDATETIME();
DBCC DROPCLEANBUFFERS;
SELECT * FROM dbo.Test3;
SELECT SYSDATETIME();
Results:
Uniquifier: 3.89 seconds
IDENTITY: 4.21 seconds
GUID: 4.85 seconds
Space used
Looking at simple results from sp_spaceused
:
EXEC sp_spaceused N'dbo.Test1';
EXEC sp_spaceused N'dbo.Test2';
EXEC sp_spaceused N'dbo.Test3';
Results:
Rows Reserved Data Index size
Test1 - Uniquifier 513300 14024 KB 13880 KB 96 KB
Test2 - IDENTITY 513300 29072 KB 15792 KB 13200 KB
Test3 - GUID 513300 49936 KB 21960 KB 27928 KB
The uniquifier takes up less space than the IDENTITY column (and obviously both take less than the GUID), since it is only needed for collisions (and probably other optimizations I'm not aware of, such as compression).
We can also look at the index pages for the clustered index on the datetime column (index id = 1) and the non-clustered primary key (index id = 2):
DBCC TRACEON(3604,-1);
-- Uniquifier
DBCC IND('tempdb', 'dbo.Test1', 1); -- 1,747 index pages
-- no second index for this table
-- IDENTITY
DBCC IND('tempdb', 'dbo.Test2', 1); -- 1,987 index pages
DBCC IND('tempdb', 'dbo.Test2', 2); -- 1,637 index pages
-- GUID
DBCC IND('tempdb', 'dbo.Test3', 1); -- 2,764 index pages
DBCC IND('tempdb', 'dbo.Test3', 2); -- 3,472 index pages
Page/Row Contents
And finally, we can take a look at a specific page to see what is stored in a row. I simply took the first row from each of the clustered index DBCC IND results from above (your page id values will almost certainly be different):
DBCC PAGE('tempdb',1, 153, 1);
DBCC PAGE('tempdb',1, 199, 1);
DBCC PAGE('tempdb',1, 217, 1);
Uniquifier - pay particular attention to length/record size:
Slot 0, Offset 0x60, Length 15, DumpStyle BYTE
----------------------------^^
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Record Size = 15
--------------^^
Memory Dump @0x000000000F7EA060
0000000000000000: 10000c00 0333ba00 fba20000 020000††††.....3º.û¢.....
Slot 1, Offset 0x6f, Length 23, DumpStyle BYTE
----------------------------^^
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 23
--------------^^
Memory Dump @0x000000000F7EA06F
0000000000000000: 30000c00 0333ba00 fba20000 02000001 †0....3º.û¢......
0000000000000010: 00170001 000000††††††††††††††††††††††.......
IDENTITY seems to have 4 extra bytes in the clustered index:
Slot 0, Offset 0x60, Length 19, DumpStyle BYTE
----------------------------^^
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Record Size = 19
--------------^^
Memory Dump @0x0000000011DAA060
0000000000000000: 10001000 a735ba00 fba20000 03020000 †....§5º.û¢......
0000000000000010: 030000†††††††††††††††††††††††††††††††...
Slot 1, Offset 0x73, Length 27, DumpStyle BYTE
----------------------------^^
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 27
--------------^^
Memory Dump @0x0000000011DAA073
0000000000000000: 30001000 a735ba00 fba20000 04020000 †0...§5º.û¢......
0000000000000010: 03000001 001b0001 000000†††††††††††††...........
GUID has an extra 16 bytes in the clustered index:
Slot 0, Offset 0x60, Length 31, DumpStyle BYTE
----------------------------^^
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Record Size = 31
--------------^^
Memory Dump @0x0000000011DAA060
0000000000000000: 10001c00 393aba00 fba20000 f3233e73 †....9:º.û¢..ó#>s
0000000000000010: a36e114b b1229a80 a5cb090a 030000††††£n.K±".¥Ë ....
Slot 1, Offset 0x7f, Length 39, DumpStyle BYTE
----------------------------^^
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 39
--------------^^
Memory Dump @0x0000000011DAA07F
0000000000000000: 30001c00 393aba00 fba20000 c7bb2544 †0...9:º.û¢..Ç»%D
0000000000000010: 4ad3574f a2c2029f e4abc9d7 03000001 †JÓWO¢Â.ä«É×....
0000000000000020: 00270001 000000††††††††††††††††††††††.'.....
Conclusion
Seems to me, by all accounts, you are better off letting the uniquifier do its own thing (assuming you don't expect to be able to differentiate between two rows with the exact same date/time value). The only time you can get in trouble with this, if any single value is duplicated 2,147,483,648 times, at which point you will overflow the uniquifier range of integer.
Best Answer
Think of it as different queries. How much it cost to navigate a B-Tree to find zero rows? That is what happens for those "extra partitions". Those that don't contain any data for your range and which are skipped with your second query alternative. If you don't have too many partitions, then this extra cost will be very low. But as number of partitions increase, so will the overhead (especially relative overhead if the selectivity for the range is high).