These days I have to optimize queries written by someone long back and I don't have business info behind them.So trying to optimize them a safe way,without hampering business logic.
While tuning queries, I have come across some doubts which I want to be clear.
The table source_price_curve is as follows:
CREATE TABLE [dbo].[source_price_curve](
[source_curve_def_id] [int] NOT NULL,
[as_of_date] [datetime] NOT NULL,
[Assessment_curve_type_value_id] [int] NOT NULL,
[curve_source_value_id] [int] NOT NULL,
[maturity_date] [datetime] NOT NULL,
[curve_value] [float] NOT NULL,
[create_user] [varchar](50) NULL,
[create_ts] [datetime] NULL,
[update_user] [varchar](50) NULL,
[update_ts] [datetime] NULL,
[bid_value] [float] NULL,
[ask_value] [float] NULL,
[is_dst] [int] NOT NULL,
CONSTRAINT [IX_unique_source_curve_def_id_index] UNIQUE NONCLUSTERED
(
[as_of_date] ASC,
[source_curve_def_id] ASC,
[maturity_date] ASC,
[is_dst] ASC,
[curve_source_value_id] ASC,
[Assessment_curve_type_value_id] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[source_price_curve] WITH NOCHECK ADD CONSTRAINT [FK_source_curve_def_id] FOREIGN KEY([source_curve_def_id])
REFERENCES [dbo].[source_price_curve_def] ([source_curve_def_id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[source_price_curve] CHECK CONSTRAINT [FK_source_curve_def_id]
GO
It has around 130 million records.The following indexes are defined within this table.
CREATE CLUSTERED INDEX [source_curve_def_id_index] ON [dbo].[source_price_curve]
(
[as_of_date] ASC,
[source_curve_def_id] ASC,
[maturity_date] ASC,
[is_dst] ASC,
[curve_source_value_id] ASC
)
CREATE NONCLUSTERED INDEX [source_price_curve_123] ON [dbo].[source_price_curve]
(
[source_curve_def_id] ASC,
[as_of_date] ASC
)
INCLUDE ( [maturity_date])
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_as_of_date_curve_source_value_id]
ON [dbo].[source_price_curve]
(
[as_of_date] ASC,
[curve_source_value_id] ASC
)
INCLUDE ( [curve_value],
[maturity_date],
[source_curve_def_id])
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_as_of_date_curve_source_value_id_Assessment_curve_type_value_id]
ON [dbo].[source_price_curve]
(
[as_of_date] ASC,
[curve_source_value_id] ASC,
[Assessment_curve_type_value_id] ASC
)
INCLUDE ( [curve_value],
[maturity_date],
[source_curve_def_id])
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_maturity_date] ON [dbo].[source_price_curve]
(
[maturity_date] ASC
)
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_source_curve_def_id_curve_source_value_id]
ON [dbo].[source_price_curve]
(
[source_curve_def_id] ASC,
[curve_source_value_id] ASC
)
INCLUDE ( [as_of_date],
[Assessment_curve_type_value_id],
[curve_value],
[is_dst],
[maturity_date])
CREATE NONCLUSTERED INDEX [IX_PT_source_price_curve_source_curve_def_id111]
ON [dbo].[source_price_curve]
(
[source_curve_def_id] ASC,
[Assessment_curve_type_value_id] ASC,
[curve_source_value_id] ASC,
[maturity_date] ASC,
[as_of_date] ASC
)
I run a query as follows:
SELECT @tenor_from =CONVERT(DATETIME,MIN(spc.maturity_date),103)
FROM source_price_curve spc
WHERE spc.as_of_date >= @as_of_date_from
The execution plan of this query is provided in the below link:
https://www.brentozar.com/pastetheplan/?id=BJ2_3boSZ
I wonder why it is utilizing index IX_PT_source_price_curve_maturity_date(with index on maturity date column only) when filter condition is on as_of_date(while selecting data from source_price_curve).
Why isn't it utilizing the clustered index which has as_of_date as the first column in index order ?
Thanks in advance.
Best Answer
Your query is looking for the minimum
maturity_date
from rows where theas_of_date
is greater than a given value.The optimizer has to choose between two main execution strategies:
Scan an index ordered by
maturity_date ASC
until it finds the first row whereas_of_date >= @as_of_date_from
.As soon as the query processor finds this first row, the scan of the index stops. This is a point often missed when reading execution plans: an index scan does not necessarily always scan the whole index.
If the first qualifying row can be found quickly, a partial scan of the ordered index is a great strategy for your query. On the other hand, this is a poor strategy if the first row matching the
as_of_date
predicate occurs toward the end of the index (inmaturity_date
order).The optimizer therefore makes an assessment of how many rows from the index it will have to scan before it finds one that matches the
as_of_date
predicate. This assessment is based on an assumption of uniform distribution (often incorrect in practice).The plan you provided shows that the Index Scan had to read 1,101,479 rows (in maturity date order) before it found the first one matching the
as_of_date
predicate.Seek an index (with
as_of_date
as the leading key) to find all rows whereas_of_date >= @as_of_date_from
, then compute the minimummaturity_date
found in those rows.This is a good strategy if the
as_of_date
predicate matches relatively few rows. It is not so good if many rows are expected to qualify. The expected cost of this strategy depends on the value of@as_of_date_from
, and statistical information about theas_of_date
column.Your query uses a local variable, so the optimizer cannot 'see' the value contained in it, and makes a guess at how selective the unknown value will be.
For an inequality predicate, the default guess is that 30% of the rows will qualify. For a table of 130 million rows, the optimizer will therefore guess that 30% * 130,000,000 = 39,000,000 rows will match. This is a lot of rows to compute the minimum
maturity_date
on, the the optimizer prefers the other strategy.You can allow the optimizer to 'see' the value inside the variable at execution time by adding an
OPTION (RECOMPILE)
hint to the query. This enables the Parameter Embedding Optimization, at the cost of having to recompile the query plan every time the query is executed.If the
@as_of_date_from
variable value varies a lot between executions, using theOPTION (RECOMPILE)
hint may be worthwhile for you. The optimizer may still choose the partial ordered scan strategy, depending on its assessment of the relative costs of the two strategies.In any case, you should change the data type of the
@as_of_date_from
todatetime
to match theas_of_date
column to avoid an implicit conversion.The data type of
@tenor_from
should probably bedatetime
notvarchar(20)
. Or change the query toCONVERT(varchar(20),MIN(spc.maturity_date),103)
. The currentCONVERT
makes no sense.