Since we are talking about the clustered index, just because you defined the CI key column as ID
, you still have the DeletedDate
data in the leaf data pages of the index. That's the nature of the clustered index: It is the table data.
Because you are typically having queries that look like:
select *
from YourTable
where DeletedDate is null;
You will likely benefit from a filtered index.
create nonclustered index IX_YourFilteredNci
on YourTable(<Key Columns Here>)
where DeletedDate is not null;
go
I didn't explicitly put the key columns here (and nonkey columns through the use of the INCLUDE
clause) because you didn't publish the DDL of your table.
As in my comment above to your question, the choice of key columns (not just columns, but also the order of the columns) will largely depend on your workload and the typical queries that would be using this index.
If you are looking to cover your query(ies), then you would need to ensure that the index satisfies all of the data required of the query(ies). Not to mention, if you have other WHERE
clauses (besides your NULL
check on DeletedDate
) or joins to consider, then the order of your key columns can be the deciding factor between a scan or a seek. And even though it is filtered, and depending on how much data you have in the index, the penalty could be considerable.
Creating an index on the column involved in the clustering key may seem a little strange. One asks why create another index, when one already exists?
The clustered index is the table. That is to say your clustered index for this table is similar to an index like:
CREATE INDEX IX_Panel
ON dbo.Panel(SubId ASC)
INCLUDE (
LineageId
, Buck
, Lot
, GlassType
, ETA
);
Pretty clearly, this is not the same as an non-clustered index on the clustering key, which in your case only "includes" the LineageId
column:
CREATE UNIQUE NONCLUSTERED INDEX [IX-Panel-SubID-I-LineageID] ON [dbo].[Panel]
(
[SubId] ASC
)
INCLUDE ([LineageId]);
As a test, I created a mock-up of your table, then insert over 700,000 rows into it:
INSERT INTO dbo.Panel(LineageId, Buck, Lot, GlassType, ETA)
SELECT (ROW_NUMBER() OVER (ORDER BY o1.object_id, o2.object_id) % 35)
, SUBSTRING(o2.name, 1, 15)
, 'lot'
, 'GlassType'
, o3.name + o2.name
FROM sys.objects o1
, sys.objects o2
, sys.objects o3;
I then ran the following query to see stats about both indexes:
SELECT o.name
, ps.index_id
, ps.index_type_desc
, ps.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.Panel'),-1,0, NULL) ps
INNER JOIN sys.objects o ON ps.object_id = o.object_id;
The results of the query above are:
Clearly, the non-clustered index is smaller than the clustered index. In this case about 1/5th the size. Being a smaller index means the query optimizer will choose to use it when the index fulfills the requirements of the query, in one way or another.
For instance,
SELECT SubId
FROM dbo.Panel;
will only need to read 1,304 pages from disk by scanning the non-clustered index, instead of having to read 7,105 pages to scan the clustered-index.
On the other hand, for a query that references columns that are not included
in the non-clustered index, SQL Server will likely use the clustered index since it fulfills the requirements of the query.
Take for instance, the following query, where you might intuitively think the query optimizer might pick the non-clustered index since it can use that to fulfill the WHERE
clause, however the simple act of having to look up the Buck
column means it is faster to simply seek the clustered index for the values in the WHERE
, then return the 3 columns in the SELECT
clause:
SELECT SubId
, LineageId
, Buck
FROM dbo.Panel
WHERE LineageId = 6
AND SubId >= 27
AND SubId <= 42;
This is a bit of a simplification, I would highly recommend looking at Brent Ozar's Index pages
Best Answer
The key idea here is that your index contains
(firstname,id)
, but notsurname
. So the options for this queryare
1) Scan the clustered index
2) Seek the non-clustered index, and then for every matching row in the index, Seek on the Clustered Index to find the
surname
. It's this "bookmark lookup" that is the most expensive part of the query, and if a reasonable percentage of your users are named 'John', it may well be cheaper just to scan the clustered index.This is why we have indexes with included columns. You can add
surname
to the index to enable this query to seek on the non-clustered index, and avoid the bookmark lookup. The index would then be a "covering index" for the query. eg