Sql-server – Returning value with Clustered index

clustered-indexindexorder-byselectsql server

I have emp table (empno, ename, job, mgr, hire_date, sal, deptno)
with no index at all.
If i run

select * from emp

i get the rows ordered by empno.
Now i create a clustered index like this:

Create clustered index name on emp(ename)

If i run select * again, the rows are now ordered by ename.
Now if i drop the index.

drop index name on emp

I would expect to see the rows ordered back by empno, like before creating the index.
Instead I still see the rows ordered by ename, as if the index were still there.
Is there a way to see the rows ordered back by empno as before without using "order by empno?".
I tried to log off and connect again to make a refresh, but did not work.
The only way was to drop and recreate the table.
Maybe is there a parameter in the table definition that is changed when the index was created, and it is not updated when the index is deleted?

It seems to me that there is a default method for displaying the data before the index, which is not preserved / restored when the index is deleted

Best Answer

When you created the clustered index with ename the rows in the table were shuffled to be in that order. When you dropped the index the rows were not reshuffled.

There is no guarantee that the rows will be returned by a query in the order that they are stored, but often this is what happens.

If the order is important then you would add an ORDER BY (and would be well advised to cluster the table on that field too to avoid sorting every time).

But don't choose a clustering key just because of that. For a more thorough discussion of clustering and heaps see http://kejser.org/clustered-indexes-vs-heaps/

Related Solutions

Sql-server – Finding exact row in Clustered index leaf page

The specifics of how SQL Server does it is not public.

But I recommend you read the vast research material on the subject which is public. Modern B-Tree Techniques is a good start. Many more papers from Goetz Graefe talk about this area, he is well known in the industry.

Interpolation search is a well known technique in B-Trees and there is plenty of material detailing possible approaches (bear in mind that none of the papers describe explicitly how SQL Server does it and there are many ways to skin a cat).

SQL Server – Excluding Clustered Key Columns from Non-Clustered Indexes

On the SQL Saturday event I've attended, one of the lecturers said it's a always good practice to exclude the clustered key columns from the non-clustered index key definition and include clause, too.

I disagree. Let me explain with the table in your question:

The clustered index is: (RecordID, QuestionID) and there are many more columns.

Any non-clustered index will also have the clustered key columns, too, appended in the end. So, an index like:

(Pts)   is equivalent to:   (Pts, RecordID, QuestionID)

and similarly:

(Pts, PtsOf)           <->:  (Pts, PtsOf, RecordID, QuestionID)

(Pts, RecordId)        <->:  (Pts, RecordID, QuestionID)

(Pts, QuestionID)      <->:  (Pts, QuestionID, RecordID)

(Pts) INCLUDE (PtsOf)  <->:  (Pts, RecordID, QuestionID) INCLUDE (PtsOf) 

(QuestionID, RecordID) <->:  (QuestionID, RecordID)

For joint tables or like this one that have a composite primary/unique key - whether it is clustered or not, it's very often to have queries that need the (a,b) index and others that will use better the (b,a) index, sometimes queries that need both. So, one often needs both of these.

If the composite clustered key has more than two columns - say (a,b,c) - it's often that you may need an index on (b,c,a) or in (d,b) and another on (e,c,a) (which of course will be equivalent to (b,c,a), (d,b,a,c) and (e,c,a,b) respectively.) You can't just remove these columns from the definitions of the non-clustered keys because the column order will change.

The suggestion has one good point though. The clustered key columns can be removed from the INCLUDE part. They are redundant, just noise there.

About the indexes in the question, a non-CI on (Pts, PtsOf) is equivalent to (Pts, PtsOf, RecordID, QuestionID), so it is very different than the original non-CI on (RecordID, QuestionID) INCLUDE (Pts, PtsOf). It will use a bit more space than the original and of course these two indexes will be useful for different types of queries.

The (Pts, PtsOf) will be be useful, for example, for queries with WHERE Pts BETWEEN 0 AND 100, WHERE Pts = 200 AND PtsOf = 300, etc.
The (RecordID, QuestionID) INCLUDE (Pts, PtsOf) is basically a copy of the table with only the 2 clustered key columns and 2 only extra columns (of the many). This is (rarely) useful and it's a form of vertical partitioning. If you often have queries that need all the rows of the table but only these 2 columns, then it's probably one these (rather rare) cases where the extra space and effort to maintain this index is justified.

Best Answer

Related Solutions

Sql-server – Finding exact row in Clustered index leaf page

SQL Server – Excluding Clustered Key Columns from Non-Clustered Indexes

Related Question