Sql-server – How to avoid index scans in SQL Server 2005

performancesql serversql-server-2005

I have a table named Cell which has columns like Cell_ID, CellValue and CellRow.

Cellrow can have values maximum 1 to 5. There is a clustered index on Cell_ID field. This table contains 100000 rows.

In my query I have to use the CellRow column in the WHERE clause like CellRow > 3 to get data from CellValue field. The problem is when I do it like this then it is always table scan that makes my query very slow.

I have also checked it by putting non-clustered index on CellRow field but still index scan because the CellRow field don't have so much values, it has only 1 to 5 values each time. I can't use CellID in the where clause.

I am using many tables in my query like

Table_1 is joining with table_2 by using clustered index column and table 2 is joining with table_3 by using non-clustered index column and table_3 is joining with Cell table by using non-clustered index.

The ending story is, I am getting different values from all the tables including cell table but when I use CellRow field in the WHERE clause, I am getting index scan.

Any solution 🙂

Thank you so much!

Best Answer

Any solution :)

No, not really.

As you said yourself - CellRow is just not very selective - 5 possible values, 100'000 rows = roughly 20'000 rows for each possible value.

SQL Server's query optimizer recognizes this and probably figures it's easier and more efficient to do a index scan rather than a seek for 20'000 rows.

The only way to avoid this would be to use a more selective index, i.e. some other column that selects 2%, 3% or max. 5% of the rows for each query.

PS: Check your execution plan - does it get the values straight from the index, or does it need a "Key Lookup" step to go get the data??

You didn't mention what data types your columns are - if the CellValue isn't too big, you could add it to the index (or at least include it in the index) to avoid costly key lookups:

CREATE INDEX IX_CellRow_CellValues
ON dbo.Cell(CellRow) INCLUDE(CellValue)

You'd still have the index scan, though

Related Solutions

SQL Server – Query Not Using Nonclustered Index and How to Fix It

the first query does a table scan based on the threshold I earlier explained in: Is it possible to increase query performance on a narrow table with millions of rows?

(most likely your query without the TOP 1000 clause will return more then 46k rows. or some where between 35k and 46k. (the grey area ;-) )

the second query, must be ordered. Since you're NC index is ordered in the order you want, it's cheaper for the optimiser to use that index, and then to the bookmark lookups to the clustered index to get the missing columns as compaired to doing a clustered index scan and then needing to order that.

reverse the order of the columns in the ORDER BY clause and you are back to a clustered index scan since the NC INDEX is then useless.

edit forgot the answer to your second question, why you DON'T want this

Using a non clustered non covering index means that a rowID is looked up in the NC index and then the missing columns have to be looked up in the clustered index (the clustered index contains all columns of a table). IO's to lookup the missing columns in the clustered index are Random IOs.

The key here is RANDOM. because for every row found in the NC index, the access methods have to go look up a new page in the clustered index. This is random, and therefore very expensive.

Now, on other hand the optimiser could also go for a clustered index scan. It can use the allocation maps to lookup scan ranges and just start reading the Clustered index in large chunks. This is sequential and much cheaper. (as long as your table isn't fragmented :-) ) The downside is, the WHOLE clustered index needs to be read. This is bad for your buffer and potentially a huge amount of IOs. but still, sequential IOs.

In your case, the optimiser decides somewhere between 35k and 46k rows, it's less expensive to to a full clustered index scan. Yeah, it's wrong. And in a lot of cases with narrow non clustered indexes with not to selective WHERE clauses or large table for that matter this goes wrong. (Your table is worse, because it's also a very narrow table.)

Now, adding the ORDER BY makes it more expensive to scan the full clustered index and then order the results. Instead, the optimiser assumes it's cheaper to use the allready ordered NC index and then pay the random IO penalty for the bookmark lookups.

So your order by is a perfect "query hint" kind of solution. BUT, at a certain point, once your query results are so big, the penalty for the bookmark lookup random IOs will be so big it becomes slower. I assume the optimiser will change plans back to the clustered index scan before that point but you never know for sure.

In your case, as long as your inserts are ordered by entereddate, as discussed in chat and the previous question (see link) you are better of creating the clustered index on the enteredDate column.

Sql-server – Question about non-clustered index storage in SQL Server

Nonclustered indexes always include a row locator.

For a heap this will be an 8 byte RID (File:Page:Slot). On a table with a clustered index it will be the clustered index key column(s). And it will always be the copied values not a pointer to the values. This duplication of CI key values into all non clustered indexes is why it is often recommended that the CI key be narrow and not frequently updated.

In the table shown in the question the Clustered index key is a 4 byte integer and potentially may also include a 4 byte uniqueifier for any duplicate key values.

In your case as the NCIs are not declared as unique the CI key will be appended to the NCI key.

For unique non clustered indexes the CI key would be added as included column(s) in the leaf pages unless explicitly made part of the key.

See Kalen Delaney: More About Nonclustered Index Keys for some additional information about how you can see this for yourself.

With these 4 rows of data all three indexes only consume a single 8KB data page.

SELECT index_id,
       index_level,
       page_count,
       record_count
FROM   sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('people'), NULL, NULL, 'DETAILED')

Returns

+----------+-------------+------------+--------------+
| index_id | index_level | page_count | record_count |
+----------+-------------+------------+--------------+
|        1 |           0 |          1 |            4 |
|        2 |           0 |          1 |            4 |
|        3 |           0 |          1 |            4 |
+----------+-------------+------------+--------------+

The additional page shown in use by sys.allocation_units.total_pages is an IAM page. This is not used for storing data but just for tracking the pages and extents comprising the index.

Best Answer

Related Solutions

SQL Server – Query Not Using Nonclustered Index and How to Fix It

Sql-server – Question about non-clustered index storage in SQL Server

Related Question