Sql-server – Column Store Index over an clustered index (created when primary key is declared) – SQL Server

clustered-indexcolumnstoresql serversql server 2014sql-server-2012

When a clustered index is created, the table itself becomes the index structure sorted by the index key.

Imagine now that we have 5 columns in a table XPTO (a,b,c,d,e) and "a" is the primary key and we create a column store index over table XPTO with columns (b,c).

What is the structure of this index? Is a different structure comparing to the clustered table? Or does the column store have pointers to the clustered table (like an other non clustered index).

Finally, the same scenario but creating the column index with all attributes, what is the structure?

Best Answer

We can learn a lot by creating a test script! You should be able to run the SQL below on any SQL 2014 instance (probably SQL 2012 as well, but I didn't test it there).

From this script, we can see that the non-clustered columnstore index on (b, c) does store the data for the clustered index column (a) and it does so in the same manner (via segments) that it stores b and c. This allows the columstore to efficiently process queries that access a, b, and c. Technically, it even allows the columnstore to be used, in combination with a key lookup, in order to process a query that needs all of the columns in the table (although this is not likely to be favored by the optimizer given the expense of the key lookup).

In terms of your questions about the "structure" of the columnstore index, I think that's too deep a question to answer here. But if you are interested in learning more, I think that Niko Neugebauer's excellent 55-part (and growing) series on columnstore has a ton of valuable information on the structure and internals: http://www.nikoport.com/columnstore/

-- Create a table to your specs
CREATE TABLE dbo.XPTO (a BIGINT NOT NULL, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, CONSTRAINT PK_XPTO PRIMARY KEY (a))
GO

-- Insert some trivial dummy data
INSERT INTO dbo.XPTO (a, b, c, d, e)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)), v2.x, v3.x, v4.x, v5.x
FROM ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v2(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v3(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v4(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v5(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v6(x)
GO

-- Create the nonclustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX cs_XPTO ON dbo.XPTO (b, c)
GO

-- Turn on "Include Actual Execution Plan"

-- Access only the columns specified in the columnstore index
-- Here we get a Columnstore Index Scan, as expected
SELECT SUM(b), SUM(c)
FROM XPTO
GO

-- Add in the clustered index column
-- Now we also get a Columnstore Index Scan!
-- This shows that the non-clustered columnstore does maintain data for "a"
-- because it is able to process this query without touching the clustered index
SELECT SUM(a), SUM(b), SUM(c)
FROM XPTO
GO

-- Access all columns
-- Now we get a Clustered Index Scan, and the columnstore index is not used
SELECT SUM(a), SUM(b), SUM(c), SUM(d), SUM(e)
FROM XPTO
GO

-- Try to force usage of columnstore
-- Now we see a tplan that uses the columnstore index but performs a key looked up to the clustered index
-- This is a plan shape I have never seen the query optmizer generate on its own, but it makes sense
-- that it is possible given that the columnstore index stores data for the clustered index column,
-- which is essentially a pointer to the corresponding row in the table
SELECT SUM(a), SUM(b), SUM(c), SUM(d), SUM(e)
FROM XPTO WITH(INDEX(cs_XPTO))
GO

-- View the column store segments
-- Here we see that there is a column store segment for columns a, b, and c (even though we asked for just b and c!)
-- This seems to indicate that clustered index columns are implicitly added to any nonclustered columnstore index
-- and are stored within that columnstore index in the same way as the requested columns
SELECT cs.*
FROM sys.partitions p
JOIN sys.column_store_segments cs
    ON cs.partition_id = p.partition_id
WHERE p.object_id = OBJECT_ID('XPTO')
GO

-- Cleanup
DROP TABLE dbo.XPTO
GO