Sql-server – Is sys.stats_columns incorrect

Let's say I have a table Foo with columns ID1, ID2 and a composite primary key defined over ID2, ID1. (I'm currently working with a System Center product that has several tables defined this way with the primary key columns listed in the opposite order they appear in the table definition.)

CREATE TABLE dbo.Foo(
  ID1 int NOT NULL,
  ID2 int NOT NULL,
CONSTRAINT [PK_Foo] PRIMARY KEY CLUSTERED (ID2, ID1)
);
GO

-- Add a row and update stats so that histogram isn't empty
INSERT INTO Foo (ID1, ID2) VALUES (1,2);
UPDATE STATISTICS dbo.Foo;

The key_ordinal column in sys.index_columns shows the index columns in the same order they were declared in the composite primary key:

SELECT t.name, i.name, c.column_id, c.name, ic.index_column_id, ic.key_ordinal
FROM sys.tables AS t
JOIN sys.indexes AS i
ON t.[object_id] = i.[object_id]
JOIN sys.index_columns AS ic
ON ic.[object_id] = i.[object_id]
AND ic.index_id = i.index_id
JOIN sys.columns AS c
ON ic.column_id = c.column_id
AND ic.[object_id] = c.[object_id]
WHERE t.name = 'Foo';

index

The histogram also shows the statistics in the same order:

DBCC SHOW_STATISTICS ('Foo',PK_Foo);

stats

However, sys.stats_columns shows the columns listed in the inverse order (ID1, ID2).

SELECT s.name, sc.stats_column_id, c.name
FROM sys.stats AS s
JOIN sys.stats_columns AS sc 
ON s.stats_id = sc.stats_id 
AND s.[object_id] = sc.[object_id] 
JOIN sys.columns AS c 
ON c.[object_id] = s.[object_id]
AND c.column_id = sc.column_id
JOIN sys.objects AS o 
ON o.[object_id] = c.[object_id] 
WHERE o.name = 'Foo'
AND s.name = 'PK_Foo';

stats_columns

Books Online says stats_column_id is a "1-based ordinal within set of stats columns," so I was expecting the value 1 to point to the first column in the statistics object.

Is this a bug in sys.stats_columns or a misunderstanding on my part?

I've verified this behaviour occurs on current builds of SQL Server 2005, 2008, 2008 R2, 2012, and 2014.

sys.stats_columns seems to reflect the order within the statistics object in other situations, for example:

CREATE TABLE dbo.Foo2(
  ID1 int NOT NULL,
  ID2 int NOT NULL,
  ID3 int NULL,
  String VARCHAR(10) NULL,
CONSTRAINT [PK_Foo2] PRIMARY KEY CLUSTERED (ID2, ID1)
);

GO

INSERT INTO Foo2 (ID1, ID2, ID3, String) VALUES (1,2,3,'String');

CREATE STATISTICS ST_Test ON Foo2 (ID3, String);
CREATE STATISTICS ST_Test2 ON Foo2 (String, ID3);

DBCC SHOW_STATISTICS ('Foo2',ST_Test);
DBCC SHOW_STATISTICS ('Foo2',ST_Test2);


SELECT s.name, sc.stats_column_id, c.name
FROM sys.stats AS s
JOIN sys.stats_columns AS sc 
ON s.stats_id = sc.stats_id 
AND s.[object_id] = sc.[object_id] 
JOIN sys.columns AS c 
ON c.[object_id] = s.[object_id]
AND c.column_id = sc.column_id
JOIN sys.objects AS o 
ON o.[object_id] = c.[object_id] 
WHERE o.name = 'Foo2'
AND s.name LIKE 'ST_Test%';

morestats

Here is another example where sys.stats_columns appears to return the correct data, this time for statistics on an index:

--drop table dbo.Foo3
CREATE TABLE dbo.Foo3(
  ID1 int NOT NULL,
  ID2 int NOT NULL,
  ID3 int NULL,
  String VARCHAR(10) NULL,
CONSTRAINT [PK_Foo3] PRIMARY KEY CLUSTERED (ID2, ID1)
);

GO

INSERT INTO Foo3 (ID1, ID2, ID3, String) VALUES (1,2,3,'String');
UPDATE STATISTICS Foo3;

CREATE INDEX IX_Test ON Foo3 (ID3, String);
CREATE INDEX IX_Test2 ON Foo3 (String, ID3);

DBCC SHOW_STATISTICS ('Foo3',IX_Test);
DBCC SHOW_STATISTICS ('Foo3',IX_Test2);

SELECT s.name, sc.stats_column_id, c.name
FROM sys.stats AS s
JOIN sys.stats_columns AS sc 
ON s.stats_id = sc.stats_id 
AND s.[object_id] = sc.[object_id] 
JOIN sys.columns AS c 
ON c.[object_id] = s.[object_id]
AND c.column_id = sc.column_id
JOIN sys.objects AS o 
ON o.[object_id] = c.[object_id] 
WHERE o.name = 'Foo3'
AND s.name LIKE 'IX_Test%';

moremorestats

Best Answer

This seems to be a long standing error:

swasheck - March 5, 2015 posted:

https://connect.microsoft.com/SQLServer/feedback/details/1163126

MSDN notes that sys.stats_columns.stats_column_id is "1-based ordinal within set of stats columns." However, it seems to actually reflect table definition order. Altering index order is not reflected in sys.stats_columns.

Max Vernon and James Lupolt seem to agree based on their comments/encouragement.

Best Answer

Related Solutions

Sql-server – statistics are up to date, but estimate is incorrect

Sql-server – Should I add a clustered Index on a foreign key that is not unique

Related Question