I have a simple table with a clustered columnstore index:
ID INT NOT NULL,
Hash BINARY(20) NOT NULL
This table has some billion rows and according to sp_spaceused
, sys.allocation_units
and SSMS reports, its size is about 25GB.
My problem is I can't account for all of this space. Querying sys.column_store_row_groups
and sys.column_store_segments
only gives me about 7,8GB. The index uses no dictionary: primary_dictionary_id
and secondary_dictionary_id
are -1 for all segments. Querying sys.column_store_dictionaries
returns no rows at all.
The tuple mover has done its job and all row groups are in the compressed state. I already tried an ALTER INDEX REORGANIZE
just in case.
My only idea for the size difference is some dictionary-like stuff I'm not accounting for. Any ideas on what I might be missing?
I'm running SQL Server 2017 (RTM-CU4).
EDIT 1:
This is the output from sp_spaceused for the table in question:
+--------+------------+-------------+-------------+------------+----------+
| name | rows | reserved | data | index_size | unused |
+--------+------------+-------------+-------------+------------+----------+
| IdsBin | 1073741824 | 25028112 KB | 25007432 KB | 16 KB | 20664 KB |
+--------+------------+-------------+-------------+------------+----------+
EDIT 2:
This is a repro script with 1 million rows. It runs in about 1 minute on my machine. warning: it drops and recreates a new database
USE master;
GO
DROP DATABASE IF EXISTS MyDbWeirdTest;
GO
CREATE DATABASE MyDbWeirdTest;
GO
USE MyDbWeirdTest;
GO
CREATE TABLE IdsBin (
ID INT NOT NULL,
Hash BINARY(20) NOT NULL
);
CREATE CLUSTERED COLUMNSTORE INDEX ix1 ON IdsBin
GO
CREATE TYPE tBin AS TABLE (
ID INT,
Hash BINARY(20)
);
GO
CREATE OR ALTER PROCEDURE pBin (
@ids AS dbo.tBin READONLY
)
AS
BEGIN
SET NOCOUNT ON;
INSERT dbo.IdsBin
SELECT ID, Hash FROM @ids;
END;
GO
SET NOCOUNT ON;
DECLARE @i INT = 1, @t INT = 1;
DECLARE @tvp dbo.tBin;
WHILE @t <= 1000000
BEGIN
DELETE @tvp;
BEGIN TRAN;
WHILE @i <= 1000
BEGIN
INSERT @tvp VALUES (@t, HASHBYTES('SHA1', CAST(@t AS BINARY(4))));
SET @i = @i + 1;
SET @t = @t + 1;
END;
EXEC pBin @tvp;
COMMIT;
SET @i = 1;
END;
GO
ALTER INDEX ix1 on IdsBin REBUILD;
GO
For this repro, sp_spaceused shows:
+--------+----------------------+----------+----------+------------+--------+
| name | rows | reserved | data | index_size | unused |
+--------+----------------------+----------+----------+------------+--------+
| IdsBin | 1000000 | 22728 KB | 22640 KB | 0 KB | 88 KB |
+--------+----------------------+----------+----------+------------+--------+
sys.column_store_row_groups:
+-----------+----------+------------------+--------------+---------------------+-------+-------------------+------------+--------------+---------------+
| object_id | index_id | partition_number | row_group_id | delta_store_hobt_id | state | state_description | total_rows | deleted_rows | size_in_bytes |
+-----------+----------+------------------+--------------+---------------------+-------+-------------------+------------+--------------+---------------+
| 901578250 | 1 | 1 | 0 | NULL | 3 | COMPRESSED | 1000000 | 0 | 5896938 |
+-----------+----------+------------------+--------------+---------------------+-------+-------------------+------------+--------------+---------------+
So sp_spaceused gives me about 22MB and sys.allocation_units (not shown) agrees. But no columnstore DMVs seems to agree on that number and they say the index is less than 6MB in size.
Best Answer
sys.column_store_segments
andsys.column_store_row_groups
store some of the metadata information about the ColumnStore data, but I believe that ends up representing the compressed size. There are LOB structures that are allocated as well, and you can see the uncompressed size in the allocation unit / partition DMVs (and if you get to the pages somehow withDBCC PAGE
, I bet you would see that they're relatively empty). In other words,sys.column_store_row_groups
shows you how much data is stored on those pages, but doesn't add in the free space on those pages (which still take up space in the data file and in memory, like a fragmented index or an index with a really low fill factor).I ran your repro and here is what I saw:
Results:
Other than unused, which I didn't bother calculating, the results match up with
sp_spaceused
:And you can confirm these numbers in
sys.dm_db_partition_stats
as well:Results:
You mentioned this but I just wanted to call out explicitly that
sp_spaceused
andallocation_units
reflect the page count, regardless of how full or empty any page might be. The row groups DMV is reflecting just the actual data. The documentation states (emphasis mine):Whereas
sys.dm_db_partition_stats
, for example, explicitly states pages throughout, not data, although I would argue that they should specify that each LOB is an 8K page here:Which number you want to trust, well, that's up to you.
As an aside, Niko Neugebauer talked about the fact that only the compressed size of the dictionaries is exposed in the columnstore DMVs here, and raised a feedback item about it. Seems to me there is other information that could be exposed in the columnstore DMVs too.