Sql-server – Efficient INSERT INTO a Table With Clustered Index

clustered-indexinsertperformancesql server

I have a SQL statement that inserts rows into a table with a clustered index on the column TRACKING_NUMBER.

E.G.:

INSERT INTO TABL_NAME (TRACKING_NUMBER, COLB, COLC) 
SELECT TRACKING_NUMBER, COL_B, COL_C 
FROM STAGING_TABLE

My question is – does it help to use an ORDER BY clause in the SELECT statement for the clustered index column, or would any gain acheived be negated by the extra sort required for the ORDER BY clause?

Best Answer

As the other answers already indicate SQL Server may or may not explicitly ensure that the rows are sorted in clustered index order prior to the insert.

This is dependant upon whether or not the clustered index operator in the plan has the DMLRequestSort property set (which in turn depends upon the estimated number of rows that are inserted).

If you find that SQL Server is underestimating this for whatever reason you might benefit from adding an explicit ORDER BY to the SELECT query to minimize page splits and ensuing fragmentation from the INSERT operation

Example:

use tempdb;

GO

CREATE TABLE T(N INT PRIMARY KEY,Filler char(2000))

CREATE TABLE T2(N INT PRIMARY KEY,Filler char(2000))

GO

DECLARE @T TABLE (U UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),N int)

INSERT INTO @T(N)
SELECT number 
FROM master..spt_values
WHERE type = 'P' AND number BETWEEN 0 AND 499

/*Estimated row count wrong as inserting from table variable*/
INSERT INTO T(N)
SELECT T1.N*1000 + T2.N
FROM @T T1, @T T2

/*Same operation using explicit sort*/    
INSERT INTO T2(N)
SELECT T1.N*1000 + T2.N
FROM @T T1, @T T2
ORDER BY T1.N*1000 + T2.N


SELECT avg_fragmentation_in_percent,
       fragment_count,
       page_count,
       avg_page_space_used_in_percent,
       record_count
FROM   sys.dm_db_index_physical_stats(2, OBJECT_ID('T'), NULL, NULL, 'DETAILED')
;  


SELECT avg_fragmentation_in_percent,
       fragment_count,
       page_count,
       avg_page_space_used_in_percent,
       record_count
FROM   sys.dm_db_index_physical_stats(2, OBJECT_ID('T2'), NULL, NULL, 'DETAILED')
;

Shows that T is massively fragmented

avg_fragmentation_in_percent fragment_count       page_count           avg_page_space_used_in_percent record_count
---------------------------- -------------------- -------------------- ------------------------------ --------------------
99.3116118225536             92535                92535                67.1668272794663               250000
99.5                         200                  200                  74.2868173956017               92535
0                            1                    1                    32.0978502594514               200

But for T2 fragmentation is minimal

avg_fragmentation_in_percent fragment_count       page_count           avg_page_space_used_in_percent record_count
---------------------------- -------------------- -------------------- ------------------------------ --------------------
0.376                        262                  62500                99.456387447492                250000
2.1551724137931              232                  232                  43.2438349394613               62500
0                            1                    1                    37.2374598468001               232

Conversely sometimes you might want to force SQL Server to underestimate the row count when you know the data is already pre-sorted and wish to avoid an unnecessary sort. One notable example is when inserting a large number of rows into a table with a newsequentialid clustered index key. In versions of SQL Server prior to Denali SQL Server adds an unnecessary and potentially expensive sort operation. This can be avoided by

DECLARE @var INT =2147483647

INSERT INTO Foo
SELECT TOP (@var) *
FROM Bar

SQL Server will then estimate that 100 rows will be inserted irrespective of the size of Bar which is below the threshold at which a sort is added to the plan. However as pointed out in the comments below this does mean that the insert will unfortunately not be able to take advantage of minimal logging.

Related Solutions

Sql-server – Update performance: clustered versus covering index

Under the covers clustered and nonclustered indexes are the same. The clustered index just has the additional property that is is guaranteed to INCLUDE all columns. Therefore the data does not need to be maintained somewhere else. So, a clustered index and a nonclustered index that INCLUDEs all columns are virtually the same from an update cost perspective.

However, every index needs to be maintained if it contains a column that was changed during an updated. That means, the more indexes you have, the more expensive updates get.

So in your situation, I would try to keep the number of indexes to a minimum. That will help update performance more than worrying about if a particular index is better clustered or covering.

That all being said, your updates still need to find the row(s) to update as quickly as possible. Because you have two orders of magnitude more updates then select, updates should be looked at first when designing the indexing strategy. After they are taken care of, look at providing the minimal number of appropriate indexes for the read queries.

How to create Clustered and non clustered index in Oracle

Does Clustered index exist in Oracle database? since I read in some blogs

Yes there is.

It is called "index organized table" (IOT) - which in my opinion is the better name as it makes it absolutely clear that the index and the table are the same physical thing (which is the reason why we can have only one clustered index in SQL Server)

If yes, please let me know the SQL statement to create a cluster index.

There is no such thing as create clustered index in Oracle.

To create an index organized table, you use the create table statement with the organization index option.

In Oracle you usually use IOTs for very narrow tables. Very often for tables that only consist of the primary key columns (e.g. m:n mapping tables), e.g.

create table assignment
(
   person_id  integer not null, 
   job_id     integer not null, 
   primary key (person_id, job_id)
)
organization index;

You can create IOTs with more column, in that case the you need to define the non-pk columns as "included" columns. E.g. if the assignment table should have additional columns, like start and end date that are not part of the primary key:

create table assignment
(
   person_id   integer not null, 
   job_id      integer not null, 
   start_date  date, 
   end_date    date,
   primary key (person_id, job_id)
)
organization index
including start_date
overflow storage (initial 4k);

See the manual for more details and examples: https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2153235

Somewhat unrelated, but maybe interesting anyway:

An interesting blog post that questions SQL Server's (and MySQL's) behaviour of using a clustered index as the default when creating a table:

Unreasonable Defaults: Primary Key as Clustering Key

Coming from an Oracle background I wholeheartedly agree with that.

Best Answer

Related Solutions

Sql-server – Update performance: clustered versus covering index

How to create Clustered and non clustered index in Oracle

Related Question