Sql-server – Insert much slower when target table has clustered index

clustered-indexsql serversql-server-2016

I've just had to debug an issue which has me thoroughly confused.

An ETL Transform process on our Development Data warehouse has just failed after working successfully every day for months. The same SSIS job calling the same stored proc, with the same table schema, indexes and data works fine in production.

This step would normally take less than 2 min. Today, after 4 hours the job hasn't completed, but hasn't failed either. There are no reported errors. Nothing in the SQL log, and sp_who2 doesn't show anything blocking.

  • Here is a link to the estimated plan when the query performs well.
  • Here is a link to the estimated plan when the query does not finish.

The job truncates a staging table then inserts about 600,000 rows of data. The ETL process has exclusive access to the table. When I checked, all I could see was waits on CXPACKET.

I have traced the fault to a unique clustered index.

The table has a non clustered primary key on an identity column (see below)

CREATE TABLE [dbo].[Transform_JobCosting_Transaction](
    [ETL_TransformKey] [int] IDENTITY(1,1) NOT NULL,
    [TransactionId] [varchar](255) NOT NULL,
    [KeyType] [varchar](255) NOT NULL,
    [FinancialYear] [varchar](255) NOT NULL,
    [Job] [varchar](255) NOT NULL,
    [Subjob] [varchar](255) NOT NULL,
    [AnalysisCode] [varchar](255) NULL,
    [etc] [varchar](255) NOT NULL,
    [etc] [varchar](255) NOT NULL,
    [etc] [varchar](255) NOT NULL
     CONSTRAINT [PK_Transform_JobCosting_Transaction] PRIMARY KEY NONCLUSTERED 
(
    [ETL_TransformKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

The problem clustered index is:

CREATE UNIQUE CLUSTERED INDEX [IDX_Unique] ON [dbo].[Transform_JobCosting_Transaction]
(   [FinancialYear] ASC,
    [KeyType] ASC,
    [TransactionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

There is a 3rd non-clustered index which doesn't affect the insert at all.

Two of us have been working this for 4 hours. I have dropped and re-added the Index 20-30 times trying options in different combinations.

Summary: Clustered index blocks inserts. Non clustered works fine.

We have tried:

  • Server restart made no difference.
  • Tables truncated and sp run manually from SSMS no diff. (With SA privs)
  • Rebuilding the index didn't help.
  • If I drop the clustered index the insert works.
  • After insert above, I can add the clustered index without error.
  • If drop and re-add the index as non-clustered it works.
  • I have checked the data and it is unique grouped by those 3 fields.
  • Changing index so that its not unique didn't make any difference.
  • Adding/removing with tablock hint didn't help.
  • I tried sorting the data before inserting and it made no difference.

Running: Microsoft SQL Server 2016 (SP1-CU2) (KB4013106) – 13.0.4422.0 (X64)
Developer Edition (64-bit) on Windows Server 2012 R2 Standard 6.3

Any ideas or suggestions would be greatly appreciated.

Best Answer

Let's take a step back and forget all of the troubleshooting around the clustered index. You have an INSERT query which used to finish in a reasonable amount of time but now won't finish after hours. Why might that query now be slow? Let's take a look at the estimated plan:

estimated plan bad

Reading from right to left, the plan is to first scan the single row from Extract_DW_Control_Finance, do a loop join with a scan of Extract_JCS_Trans on the inner side, sort the data according to the clustered key of the target table, and do another loop join with a scan of Extract_GL_Jnl_Trans on the inner side. The first join probably isn't the issue. The plan can't actually benefit from parallelism, but with a single row in the outer result set the scan on Extract_JCS_Trans should only happen once. However, the optimizer estimates that a single row will come out of that join. If that row estimate is wrong then you could end up doing hundreds of thousands of clustered index scans on Extract_GL_Jnl_Trans.

The query plan for the query which performs well uses a different strategy. The row estimates are significantly different and it performs a hash join:

query plan good query

I suspect that the optimizer will choose a different plan for the poorly performing query if you fix the row estimates. If the Extract_DW_Control_Finance table will always have one row you could consider moving that into a local variable and possibly using a RECOMPILE hint. That could result in a much better estimate.

In terms of why removing the clustered index causes the issue, I suspect that the optimizer does a hash join to Extract_GL_Jnl_Trans without the clustered index. A hash join does not preserve the order of the outer input but a loop join does preserve order. The optimizer may have costed doing the sort on a single row and performing a loop join lower than doing a hash join and performing the sort later on 356566 rows. However, if the sort is not needed then doing the hash join may have had a lower cost than the loop join. It probably all comes down to fixing your cardinality estimates.

If you need to do more troubleshooting while the slow query is running, you could consider trace flag 7412 if you're on SQL Server 2016 SP1. That should given you clues as to where SQL Server is "stuck" in the query plan. If you're able to ask for an actual plan or run the query directly in SSMS you could use sys.dm_exec_query_profiles or the live query statistics feature.