Sparsing
When doing some tests on sparse columns, as you do, there was a performance setback that I would like to know the direct cause of.
DDL
I created two identical tables, one with 4 sparse columns and one with no sparse columns.
--Non Sparse columns table & NC index
CREATE TABLE dbo.nonsparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
charval char(20) NULL,
varcharval varchar(20) NULL,
intval int NULL,
bigintval bigint NULL
);
CREATE INDEX IX_Nonsparse_intval_varcharval
ON dbo.nonsparse(intval,varcharval)
INCLUDE(bigintval,charval);
-- sparse columns table & NC index
CREATE TABLE dbo.sparse( ID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
charval char(20) SPARSE NULL ,
varcharval varchar(20) SPARSE NULL,
intval int SPARSE NULL,
bigintval bigint SPARSE NULL
);
CREATE INDEX IX_sparse_intval_varcharval
ON dbo.sparse(intval,varcharval)
INCLUDE(bigintval,charval);
DML
I then inserted about 2540 NON-NULL values into both.
INSERT INTO dbo.nonsparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
SELECT 'Val1','Val2',20,19
FROM MASTER..spt_values;
INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
SELECT 'Val1','Val2',20,19
FROM MASTER..spt_values;
Afterwards, I inserted 1M NULL values into both tables
INSERT INTO dbo.nonsparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
SELECT TOP(1000000) NULL,NULL,NULL,NULL
FROM MASTER..spt_values spt1
CROSS APPLY MASTER..spt_values spt2;
INSERT INTO dbo.sparse WITH(TABLOCK) (charval, varcharval,intval,bigintval)
SELECT TOP(1000000) NULL,NULL,NULL,NULL
FROM MASTER..spt_values spt1
CROSS APPLY MASTER..spt_values spt2;
Queries
Nonsparse table execution
When running this query twice on the newly created nonsparse table:
SET STATISTICS IO, TIME ON;
SELECT * FROM dbo.nonsparse
WHERE 1= (SELECT 1) -- force non trivial plan
OPTION(RECOMPILE,MAXDOP 1);
The logical reads show 5257 pages
(1002540 rows affected)
Table 'nonsparse'. Scan count 1, logical reads 5257, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
And the cpu time is at 343 ms
SQL Server Execution Times:
CPU time = 343 ms, elapsed time = 3850 ms.
sparse table execution
Running the same query twice on the sparse table:
SELECT * FROM dbo.sparse
WHERE 1= (SELECT 1) -- force non trivial plan
OPTION(RECOMPILE,MAXDOP 1);
The reads are lower, 1763
(1002540 rows affected)
Table 'sparse'. Scan count 1, logical reads 1763, physical reads 3, read-ahead reads 1759, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
But the cpu time is higher, 547 ms.
SQL Server Execution Times:
CPU time = 547 ms, elapsed time = 2406 ms.
non sparse table execution plan
Questions
Original question
Since the NULL values are not stored directly in the sparse columns, could the increase in cpu time be due to returning the NULL values as a resultset? Or is it simply the behaviour as noted in the documentation?
Sparse columns reduce the space requirements for null values at the
cost of more overhead to retrieve nonnull values
Or is the overhead only related to reads & storage used?
Even when running ssms with the discard results after execution option, the cpu time of the sparse select was higher (407 ms) in comparison to the non sparse (219 ms).
EDIT
It might have been the overhead of the non null values, even if there are only 2540 present, but I am still not convinced.
This seems to be about the same performance, but the sparse factor was lost.
CREATE INDEX IX_Filtered
ON dbo.sparse(charval,varcharval,intval,bigintval)
WHERE charval IS NULL
AND varcharval IS NULL
AND intval IS NULL
AND bigintval IS NULL;
CREATE INDEX IX_Filtered
ON dbo.nonsparse(charval,varcharval,intval,bigintval)
WHERE charval IS NULL
AND varcharval IS NULL
AND intval IS NULL
AND bigintval IS NULL;
SET STATISTICS IO, TIME ON;
SELECT charval,varcharval,intval,bigintval FROM dbo.sparse WITH(INDEX(IX_Filtered))
WHERE charval IS NULL AND varcharval IS NULL
AND intval IS NULL
AND bigintval IS NULL
OPTION(RECOMPILE,MAXDOP 1);
SELECT charval,varcharval,intval,bigintval
FROM dbo.nonsparse WITH(INDEX(IX_Filtered))
WHERE charval IS NULL AND
varcharval IS NULL
AND intval IS NULL
AND bigintval IS NULL
OPTION(RECOMPILE,MAXDOP 1);
Seems to have about the same execution time:
SQL Server Execution Times:
CPU time = 297 ms, elapsed time = 292 ms.
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 319 ms.
But why are the logical reads the same amount now? Shouldn't the filtered index for the sparse column not store anything except the included ID field and some other non-data pages?
Table 'sparse'. Scan count 1, logical reads 5785,
Table 'nonsparse'. Scan count 1, logical reads 5785
And the size of both indices:
RowCounts Used_MB Unused_MB Total_MB
1000000 45.20 0.06 45.26
Why are these the same size? Was the sparse-ness lost?
Both query plans when using the filtered index
Extra Info
select @@version
Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) –
14.0.3223.3 (X64) Jul 12 2019 17:43:08 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Windows Server
2012 R2 Datacenter 6.3 (Build 9600: ) (Hypervisor)
While running the queries and only selecting the ID field, the cpu time is comparable, with lower logical reads for the sparse table.
Size of the tables
SchemaName TableName RowCounts Used_MB Unused_MB Total_MB
dbo nonsparse 1002540 89.54 0.10 89.64
dbo sparse 1002540 27.95 0.20 28.14
When forcing either the clustered or nonclustered index, the cpu time difference remains.
Best Answer
Seems so. The "overhead" mentioned in the documentation appears to be CPU overhead.
Profiling the two queries, the sparse query sampled 367 ms of CPU, while the non-sparse had 284 ms of CPU. That's a difference of 83 ms.
Where is most of that?
Both profiles look very similar until they get to
sqlmin!IndexDataSetSession::GetNextRowValuesInternal
. At that point, the sparse code goes down a path that runssqlmin!IndexDataSetSession::GetDataLong
, which calls some functions that look like they relate to the sparse column feature (HasSparseVector
,StoreColumnValue
), and add up to (42 + 11 =) 53 ms.Yeah, it appears that the sparse storage optimization does not carry over to nonclustered indexes when the sparse column is used as an index key. So nonclustered index key columns take up their full size regardless of sparseness, but included columns take up zero space if they are sparse and NULL.
Looking at
DBCC PAGE
output from a clustered index page with NULL-valued sparse columns, I can see that the record length is 11 (4 for the ID + 7 for the standard per-record overhead):For the filtered index, the record is always 40, which is the sum of the size of all the key columns (4 byte ID + 20 byte charval + 4 byte varcharval + 4 byte intval + 8 byte big intval = 40 bytes).
For some reason,
DBCC PAGE
does not include the 7 byte overhead in "Record Size" for index records:The non-filtered index size is smaller (4 byte ID + 4 byte intval + 4 byte varcharval = 12 bytes) because two of the sparse columns are included columns, which again gets the sparseness optimization:
I guess this difference in behavior lines up with one of the limitations listed in the docs page:
They are allowed to be keys in nonclustered indexes, but they are not stored, uh, sparsely.