Sql-server – Index utilization issues in SQL Server queries

index-tuningoptimizationperformancequery-performancesql server

These days I have to optimize queries written by someone long back and I don't have business info behind them.So trying to optimize them a safe way,without hampering business logic.
While tuning queries, I have come across some doubts which I want to be clear.

The table source_price_curve is as follows:

CREATE TABLE [dbo].[source_price_curve](
    [source_curve_def_id] [int] NOT NULL,
    [as_of_date] [datetime] NOT NULL,
    [Assessment_curve_type_value_id] [int] NOT NULL,
    [curve_source_value_id] [int] NOT NULL,
    [maturity_date] [datetime] NOT NULL,
    [curve_value] [float] NOT NULL,
    [create_user] [varchar](50) NULL,
    [create_ts] [datetime] NULL,
    [update_user] [varchar](50) NULL,
    [update_ts] [datetime] NULL,
    [bid_value] [float] NULL,
    [ask_value] [float] NULL,
    [is_dst] [int] NOT NULL,
 CONSTRAINT [IX_unique_source_curve_def_id_index] UNIQUE NONCLUSTERED 
(
    [as_of_date] ASC,
    [source_curve_def_id] ASC,
    [maturity_date] ASC,
    [is_dst] ASC,
    [curve_source_value_id] ASC,
    [Assessment_curve_type_value_id] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[source_price_curve]  WITH NOCHECK ADD  CONSTRAINT [FK_source_curve_def_id] FOREIGN KEY([source_curve_def_id])
REFERENCES [dbo].[source_price_curve_def] ([source_curve_def_id])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[source_price_curve] CHECK CONSTRAINT [FK_source_curve_def_id]
GO

It has around 130 million records.The following indexes are defined within this table.

CREATE CLUSTERED INDEX [source_curve_def_id_index] ON [dbo].[source_price_curve]
(
    [as_of_date] ASC,
    [source_curve_def_id] ASC,
    [maturity_date] ASC,
    [is_dst] ASC,
    [curve_source_value_id] ASC
)
CREATE NONCLUSTERED INDEX [source_price_curve_123] ON [dbo].[source_price_curve]
(
    [source_curve_def_id] ASC,
    [as_of_date] ASC
)
INCLUDE (   [maturity_date]) 
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_as_of_date_curve_source_value_id] 
ON [dbo].[source_price_curve]
(
    [as_of_date] ASC,
    [curve_source_value_id] ASC
)
INCLUDE (   [curve_value],
    [maturity_date],
    [source_curve_def_id])
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_as_of_date_curve_source_value_id_Assessment_curve_type_value_id] 
ON [dbo].[source_price_curve]
(
    [as_of_date] ASC,
    [curve_source_value_id] ASC,
    [Assessment_curve_type_value_id] ASC
)
INCLUDE (   [curve_value],
    [maturity_date],
    [source_curve_def_id]) 
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_maturity_date] ON [dbo].[source_price_curve]
(
    [maturity_date] ASC
)
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_source_curve_def_id_curve_source_value_id]
 ON [dbo].[source_price_curve]
(
    [source_curve_def_id] ASC,
    [curve_source_value_id] ASC
)
INCLUDE (   [as_of_date],
    [Assessment_curve_type_value_id],
    [curve_value],
    [is_dst],
    [maturity_date]) 
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_source_curve_def_id111] 
ON [dbo].[source_price_curve]
(
    [source_curve_def_id] ASC,
    [Assessment_curve_type_value_id] ASC,
    [curve_source_value_id] ASC,
    [maturity_date] ASC,
    [as_of_date] ASC
)

I run a query as follows:

SELECT @tenor_from =CONVERT(DATETIME,MIN(spc.maturity_date),103)  
FROM source_price_curve spc  
WHERE spc.as_of_date >= @as_of_date_from

The execution plan of this query is provided in the below link:
https://www.brentozar.com/pastetheplan/?id=BJ2_3boSZ

I wonder why it is utilizing index IX_PT_source_price_curve_maturity_date(with index on maturity date column only) when filter condition is on as_of_date(while selecting data from source_price_curve).
Why isn't it utilizing the clustered index which has as_of_date as the first column in index order ?

Thanks in advance.

Best Answer

Your query is looking for the minimum maturity_date from rows where the as_of_date is greater than a given value.

The optimizer has to choose between two main execution strategies:

Scan an index ordered by maturity_date ASC until it finds the first row where as_of_date >= @as_of_date_from.

As soon as the query processor finds this first row, the scan of the index stops. This is a point often missed when reading execution plans: an index scan does not necessarily always scan the whole index.

If the first qualifying row can be found quickly, a partial scan of the ordered index is a great strategy for your query. On the other hand, this is a poor strategy if the first row matching the as_of_date predicate occurs toward the end of the index (in maturity_date order).

The optimizer therefore makes an assessment of how many rows from the index it will have to scan before it finds one that matches the as_of_date predicate. This assessment is based on an assumption of uniform distribution (often incorrect in practice).

The plan you provided shows that the Index Scan had to read 1,101,479 rows (in maturity date order) before it found the first one matching the as_of_date predicate.
Seek an index (with as_of_date as the leading key) to find all rows where as_of_date >= @as_of_date_from, then compute the minimum maturity_date found in those rows.

This is a good strategy if the as_of_date predicate matches relatively few rows. It is not so good if many rows are expected to qualify. The expected cost of this strategy depends on the value of @as_of_date_from, and statistical information about the as_of_date column.

Your query uses a local variable, so the optimizer cannot 'see' the value contained in it, and makes a guess at how selective the unknown value will be.

For an inequality predicate, the default guess is that 30% of the rows will qualify. For a table of 130 million rows, the optimizer will therefore guess that 30% * 130,000,000 = 39,000,000 rows will match. This is a lot of rows to compute the minimum maturity_date on, the the optimizer prefers the other strategy.

You can allow the optimizer to 'see' the value inside the variable at execution time by adding an OPTION (RECOMPILE) hint to the query. This enables the Parameter Embedding Optimization, at the cost of having to recompile the query plan every time the query is executed.

If the @as_of_date_from variable value varies a lot between executions, using the OPTION (RECOMPILE) hint may be worthwhile for you. The optimizer may still choose the partial ordered scan strategy, depending on its assessment of the relative costs of the two strategies.

In any case, you should change the data type of the @as_of_date_from to datetime to match the as_of_date column to avoid an implicit conversion.

The data type of @tenor_from should probably be datetime not varchar(20). Or change the query to CONVERT(varchar(20),MIN(spc.maturity_date),103). The current CONVERT makes no sense.

Best Answer

Related Solutions

Mysql – Adding index to large thesql tables

Mysql – Simple query is slow on 4M-rows table

Related Question