I'm struggling to identify effective indexes (or rewrite the query) to improve a query with the following confounding predicates:
- JOIN on a date from one table being in range – between two date fields on second table (one is nullable, one is not nullable in PK).
- The date used is actually the value in date field (nullable) +1.
- WHERE clauses includes OR logic on multiple flag fields.
The simplified version of the query is:
select
d.dim_date_id
,f.dim_provider_id
,f.dim_event_id
,d.date
from DWH.dbo.tbl_fact_outcome f
join DWH.dbo.tbl_dim_date d on DATEADD(DAY,1,d.date) between f.known_from and f.known_to
where
f.known_from > getdate()-12
and (d.flag_latest_day = 'Y' or d.flag_end_of_month = 'Y' or (d.flag_end_of_week = 'Y' AND d.flag_latest_week = 'Y'))
and d.flag_future_day = 'N'
and f.deleted = 0
tbl_fact_outcome has these indexes:
- PK clustered index on input_form_id, known_from
- Non-unique Nonclustered index on deleted, known_from, known_to (INCLUDES the required _dim_id fields)
tbl_dim_date has these indexes:
- PK clustered index on dim_date_id
- Non-unique nonclustered index on flag_future_day, date (INCLUDES relevant flag fields)
At present, it estimates 853 rows but returns 16,784.
Here is the query plan:
https://www.brentozar.com/pastetheplan/?id=rydKb_3AI
Statistics are up to date.
I have tried re-ordering the covering indexes but no improvement.
I'm totally stumped as to what else to try with indexes or the code itself to improve performance, so any pointers appreciated.
EDIT 05/07/2020
Ruled out the following suggestions from StackOverflow:
- Filtered index (on deleted) on tbl_fact_outcome – less than 1% of records would be filtered out, so not worthwhile
- Filtered index (using entire WHERE clause from query) on tbl_dim_date – not possible to use OR in index
- Index on tbl_dim_date with INCLUDEd fields as key fields – tried this, made no difference, not used by optimizer.
EDIT 06/07/2020 – table DDLs
tbl_dim_date
CREATE TABLE [dbo].[tbl_dim_date](
[dim_date_id] [int] IDENTITY(1,1) NOT NULL,
[date] [date] NULL,
[day] [varchar](9) NULL,
[month] [varchar](9) NULL,
[month_of_calendar_year] [int] NULL,
[day_of_calendar_year] [int] NULL,
[year] [int] NULL,
[flag_end_of_month] [char](1) NULL,
[academic_year] [varchar](25) NULL,
[flag_latest_month_snapshot] [char](1) NULL,
[flag_latest_day] [char](1) NULL,
[flag_future_day] [char](1) NULL,
[InsertAuditKey] [int] NULL,
[UpdateAuditKey] [int] NULL,
[flag_end_of_week] [char](1) NULL,
[flag_latest_week] [char](1) NULL,
PRIMARY KEY CLUSTERED
(
[dim_date_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [dbotbl_dim_date_UC] UNIQUE NONCLUSTERED
(
[date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [JM_TEST_03072020] ON [dbo].[tbl_dim_date]
(
[flag_future_day] ASC,
[date] DESC
)
INCLUDE([dim_date_id],[flag_end_of_month],[flag_latest_day],[flag_end_of_week],[flag_latest_week])
tbl_fact_outcome
CREATE TABLE [dbo].[tbl_fact_outcome](
[known_from] [date] NOT NULL,
[known_to] [date] NULL,
[source_id] [int] NULL,
[input_form_id] [varchar](255) NOT NULL,
[dim_provider_id] [int] NULL,
[dim_event_id] [int] NULL,
[dim_question_id] [int] NULL,
[dim_answer_id] [int] NULL,
[dim_form_id] [int] NULL,
[count_of_outcomes] [int] NULL,
[InsertAuditKey] [int] NULL,
[UpdateAuditKey] [int] NULL,
[LNK_FORM_OUTCOME_ID] [int] NULL,
[answer] [varchar](4000) NULL,
[deleted] [bit] NULL,
[dim_latest_provider_id] [int] NULL,
CONSTRAINT [PK_dbotbl_Fact_Outcome] PRIMARY KEY CLUSTERED
(
[input_form_id] ASC,
[known_from] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [ix_tbl_fact_outcome_deleted_known_from_known_to_includes] ON [dbo].[tbl_fact_outcome]
(
[deleted] ASC,
[known_from] ASC,
[known_to] ASC
)
INCLUDE([dim_provider_id],[dim_event_id],[dim_question_id],[dim_answer_id],[dim_form_id])
Query plans after following GMass DBA's suggestions
NB data is different since initial run but issue not affected
- Date join made SARGable by using > and < rather than between, and changing +1 to date from tbl_dim_date, to -1 on known_from on tbl_fact_outcome
- New index on tbl_dim_date (date, flag_future_day INCLUDE other columns)
- New index in tbl_fact_outcome (known_from, deleted, known_to INCLUDE other columns)
Re-run of initial query prior to changes (apart from getdate()-5 instead of getdate()-12 last week) – plan essentially the same:
https://www.brentozar.com/pastetheplan/?id=rJKN3og1w
Query with amended date logic plus new indexes on each table (only one used).
Improved, but estimates still way off:
https://www.brentozar.com/pastetheplan/?id=Sy_gsoxkw
Query plan after amending date j
Best Answer