Sql-server – Column Store Index over an clustered index (created when primary key is declared) – SQL Server

clustered-indexcolumnstoresql serversql server 2014sql-server-2012

When a clustered index is created, the table itself becomes the index structure sorted by the index key.

Imagine now that we have 5 columns in a table XPTO (a,b,c,d,e) and "a" is the primary key and we create a column store index over table XPTO with columns (b,c).

What is the structure of this index? Is a different structure comparing to the clustered table? Or does the column store have pointers to the clustered table (like an other non clustered index).

Finally, the same scenario but creating the column index with all attributes, what is the structure?

Best Answer

We can learn a lot by creating a test script! You should be able to run the SQL below on any SQL 2014 instance (probably SQL 2012 as well, but I didn't test it there).

From this script, we can see that the non-clustered columnstore index on (b, c) does store the data for the clustered index column (a) and it does so in the same manner (via segments) that it stores b and c. This allows the columstore to efficiently process queries that access a, b, and c. Technically, it even allows the columnstore to be used, in combination with a key lookup, in order to process a query that needs all of the columns in the table (although this is not likely to be favored by the optimizer given the expense of the key lookup).

In terms of your questions about the "structure" of the columnstore index, I think that's too deep a question to answer here. But if you are interested in learning more, I think that Niko Neugebauer's excellent 55-part (and growing) series on columnstore has a ton of valuable information on the structure and internals: http://www.nikoport.com/columnstore/

-- Create a table to your specs
CREATE TABLE dbo.XPTO (a BIGINT NOT NULL, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, CONSTRAINT PK_XPTO PRIMARY KEY (a))
GO

-- Insert some trivial dummy data
INSERT INTO dbo.XPTO (a, b, c, d, e)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)), v2.x, v3.x, v4.x, v5.x
FROM ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v2(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v3(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v4(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v5(x)
CROSS JOIN ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) v6(x)
GO

-- Create the nonclustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX cs_XPTO ON dbo.XPTO (b, c)
GO

-- Turn on "Include Actual Execution Plan"

-- Access only the columns specified in the columnstore index
-- Here we get a Columnstore Index Scan, as expected
SELECT SUM(b), SUM(c)
FROM XPTO
GO

-- Add in the clustered index column
-- Now we also get a Columnstore Index Scan!
-- This shows that the non-clustered columnstore does maintain data for "a"
-- because it is able to process this query without touching the clustered index
SELECT SUM(a), SUM(b), SUM(c)
FROM XPTO
GO

-- Access all columns
-- Now we get a Clustered Index Scan, and the columnstore index is not used
SELECT SUM(a), SUM(b), SUM(c), SUM(d), SUM(e)
FROM XPTO
GO

-- Try to force usage of columnstore
-- Now we see a tplan that uses the columnstore index but performs a key looked up to the clustered index
-- This is a plan shape I have never seen the query optmizer generate on its own, but it makes sense
-- that it is possible given that the columnstore index stores data for the clustered index column,
-- which is essentially a pointer to the corresponding row in the table
SELECT SUM(a), SUM(b), SUM(c), SUM(d), SUM(e)
FROM XPTO WITH(INDEX(cs_XPTO))
GO

-- View the column store segments
-- Here we see that there is a column store segment for columns a, b, and c (even though we asked for just b and c!)
-- This seems to indicate that clustered index columns are implicitly added to any nonclustered columnstore index
-- and are stored within that columnstore index in the same way as the requested columns
SELECT cs.*
FROM sys.partitions p
JOIN sys.column_store_segments cs
    ON cs.partition_id = p.partition_id
WHERE p.object_id = OBJECT_ID('XPTO')
GO

-- Cleanup
DROP TABLE dbo.XPTO
GO

Related Solutions

Sql-server – SQL Server Primary key / clustered index design decision

You are correct to separate "clustered index" from "primary key":

A clustered index is the organisation of data on disk is better if
- narrow
- numeric
- increasing (strictly monotonic)
The primary key identifies a row

Note: GUIDs make poor clustering keys

In this case, with the surrogate column, the table has 2 candidate keys:

ProductHistoryID
ProductNo + CreatedDateTime

Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:

clustered index should be on ProductHistoryID
unique non-clustered index on (ProductNo, CreatedDateTime)

Example

CREATE TABLE Product (
    ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
    ProductNo ...
    CreatedDateTime ...

then you a choice of

    CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
    CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)

    CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
    CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)

Also, the pattern you have is a "type 2 Slowly Changing Dimension"

Foreign Keys to tables where primary key is not clustered index

Taken from the MSDN article on FOREIGN KEY Constraints

A FOREIGN KEY constraint does not have to be linked only to a PRIMARY KEY constraint in another table; it can also be defined to reference the columns of a UNIQUE constraint in another table.

I think it is perfectly fine to define a UNIQUE CLUSTERED INDEX on an IDENTITY column and define the PRIMARY KEY on different columns, often with a NONCLUSTERED INDEX but you don't have to specify an index.

Best Answer

Related Solutions

Sql-server – SQL Server Primary key / clustered index design decision

Foreign Keys to tables where primary key is not clustered index

Related Question