Sql-server – Index or query tuning for complex predicates

cardinality-estimatesindex-tuningoptimizationquery-performancesql server

I'm struggling to identify effective indexes (or rewrite the query) to improve a query with the following confounding predicates:

  • JOIN on a date from one table being in range – between two date fields on second table (one is nullable, one is not nullable in PK).
  • The date used is actually the value in date field (nullable) +1.
  • WHERE clauses includes OR logic on multiple flag fields.

The simplified version of the query is:

select
d.dim_date_id
,f.dim_provider_id
,f.dim_event_id
,d.date

from DWH.dbo.tbl_fact_outcome f
join   DWH.dbo.tbl_dim_date d on DATEADD(DAY,1,d.date) between f.known_from and f.known_to 

where
f.known_from > getdate()-12
and (d.flag_latest_day = 'Y' or  d.flag_end_of_month = 'Y'  or (d.flag_end_of_week = 'Y' AND d.flag_latest_week = 'Y'))
and d.flag_future_day = 'N'
and f.deleted = 0

tbl_fact_outcome has these indexes:

  • PK clustered index on input_form_id, known_from
  • Non-unique Nonclustered index on deleted, known_from, known_to (INCLUDES the required _dim_id fields)

tbl_dim_date has these indexes:

  • PK clustered index on dim_date_id
  • Non-unique nonclustered index on flag_future_day, date (INCLUDES relevant flag fields)

At present, it estimates 853 rows but returns 16,784.

Here is the query plan:
https://www.brentozar.com/pastetheplan/?id=rydKb_3AI

Statistics are up to date.
I have tried re-ordering the covering indexes but no improvement.

I'm totally stumped as to what else to try with indexes or the code itself to improve performance, so any pointers appreciated.


EDIT 05/07/2020

Ruled out the following suggestions from StackOverflow:

  • Filtered index (on deleted) on tbl_fact_outcome – less than 1% of records would be filtered out, so not worthwhile
  • Filtered index (using entire WHERE clause from query) on tbl_dim_date – not possible to use OR in index
  • Index on tbl_dim_date with INCLUDEd fields as key fields – tried this, made no difference, not used by optimizer.

EDIT 06/07/2020 – table DDLs

tbl_dim_date

CREATE TABLE [dbo].[tbl_dim_date](
   [dim_date_id] [int] IDENTITY(1,1) NOT NULL,
   [date] [date] NULL,
   [day] [varchar](9) NULL,
   [month] [varchar](9) NULL,
   [month_of_calendar_year] [int] NULL,
   [day_of_calendar_year] [int] NULL,
   [year] [int] NULL,
   [flag_end_of_month] [char](1) NULL,
   [academic_year] [varchar](25) NULL,
   [flag_latest_month_snapshot] [char](1) NULL,
   [flag_latest_day] [char](1) NULL,
   [flag_future_day] [char](1) NULL,
   [InsertAuditKey] [int] NULL,
   [UpdateAuditKey] [int] NULL,
   [flag_end_of_week] [char](1) NULL,
   [flag_latest_week] [char](1) NULL,
PRIMARY KEY CLUSTERED 
(
   [dim_date_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],


CONSTRAINT [dbotbl_dim_date_UC] UNIQUE NONCLUSTERED 
(
   [date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO



CREATE NONCLUSTERED INDEX [JM_TEST_03072020] ON [dbo].[tbl_dim_date]
(
   [flag_future_day] ASC,
   [date] DESC
)
INCLUDE([dim_date_id],[flag_end_of_month],[flag_latest_day],[flag_end_of_week],[flag_latest_week])

tbl_fact_outcome

CREATE TABLE [dbo].[tbl_fact_outcome](
    [known_from] [date] NOT NULL,
    [known_to] [date] NULL,
    [source_id] [int] NULL,
    [input_form_id] [varchar](255) NOT NULL,
    [dim_provider_id] [int] NULL,
    [dim_event_id] [int] NULL,
    [dim_question_id] [int] NULL,
    [dim_answer_id] [int] NULL,
    [dim_form_id] [int] NULL,
    [count_of_outcomes] [int] NULL,
    [InsertAuditKey] [int] NULL,
    [UpdateAuditKey] [int] NULL,
    [LNK_FORM_OUTCOME_ID] [int] NULL,
    [answer] [varchar](4000) NULL,
    [deleted] [bit] NULL,
    [dim_latest_provider_id] [int] NULL,
 CONSTRAINT [PK_dbotbl_Fact_Outcome] PRIMARY KEY CLUSTERED 
(
    [input_form_id] ASC,
    [known_from] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE NONCLUSTERED INDEX [ix_tbl_fact_outcome_deleted_known_from_known_to_includes] ON [dbo].[tbl_fact_outcome]
(
    [deleted] ASC,
    [known_from] ASC,
    [known_to] ASC
)
INCLUDE([dim_provider_id],[dim_event_id],[dim_question_id],[dim_answer_id],[dim_form_id])


Query plans after following GMass DBA's suggestions
NB data is different since initial run but issue not affected

  • Date join made SARGable by using > and < rather than between, and changing +1 to date from tbl_dim_date, to -1 on known_from on tbl_fact_outcome
  • New index on tbl_dim_date (date, flag_future_day INCLUDE other columns)
  • New index in tbl_fact_outcome (known_from, deleted, known_to INCLUDE other columns)

Re-run of initial query prior to changes (apart from getdate()-5 instead of getdate()-12 last week) – plan essentially the same:
https://www.brentozar.com/pastetheplan/?id=rJKN3og1w

Query with amended date logic plus new indexes on each table (only one used).
Improved, but estimates still way off:
https://www.brentozar.com/pastetheplan/?id=Sy_gsoxkw

Query plan after amending date j

Best Answer

  1. Did you try indexed view?
CREATE VIEW vw WITH SCHEMABINDING
AS 
SELECT d.dim_date_id
    ,f.dim_provider_id
    ,f.dim_event_id
    ,d.date
    ,f.input_form_id
    ,f.known_from
from dbo.tbl_fact_outcome f
join  dbo.tbl_dim_date d on DATEADD(DAY,1,d.date) between f.known_from and f.known_to 

where
(d.flag_latest_day = 'Y' or  d.flag_end_of_month = 'Y'  or (d.flag_end_of_week = 'Y' AND d.flag_latest_week = 'Y'))
and d.flag_future_day = 'N'
and f.deleted = 0


CREATE UNIQUE CLUSTERED INDEX ix_vw ON vw (known_from, date, input_form_id);

SELECT dim_date_id
    ,dim_provider_id
    ,dim_event_id
    ,date
FROM vw 
WHERE known_from > getdate()-12
  1. solution with indexed computed column:
ALTER TABLE tbl_dim_date ADD date1 AS 
CASE WHEN (flag_latest_day = 'Y' or  flag_end_of_month = 'Y'  or (flag_end_of_week = 'Y' AND flag_latest_week = 'Y'))
and flag_future_day = 'N' THEN  DATEADD(DAY,1,date) ELSE NULL END

CREATE INDEX ix1 ON tbl_dim_date(date1);

select
d.dim_date_id
,f.dim_provider_id
,f.dim_event_id
,DATEADD(DAY,-1,d.date1) AS date
from DWH.dbo.tbl_fact_outcome f
join   DWH.dbo.tbl_dim_date d on date1 between f.known_from and f.known_to 

where
f.known_from > getdate()-12
and f.deleted = 0