I have a query that takes around 32 seconds, I tried many ways to fix the performance problem.
The first thing I tried is adding a new column that contain date and time as an integer. After I added this column to a non clustered index, the query now is executed in 2 seconds. but when I look at the execution plan, I see that there are some warnings.
Edit :
DECLARE @StartDate int=2010010110,--2010-01-01 10:00
@EndDate int=2019101011 --2019-10-10 11:00
SELECT newDate,
IdFluide
IdUsage,
SUM(c.[Conso-KW]) AS ConsoKW
FROM fait.Consommation AS c WITH(FORCESCAN, INDEX(0))
where newDate >= @StartDate
AND newDate <= @EndDate
AND IdSource = 1
AND IdTypeFluide = 1
AND IdUsage > 0
AND exists(select 1 from geo.DimGeographie dg
INNER JOIN geo.ActivePerimeterDetail apd
ON dg.ContractId = apd.ContractId
AND dg.IdNiveau3=apd.PerimeterId
AND apd.PerimeterLevelId=3
AND dg.ContractId = 2
AND dg.ModeGeo='PA'
AND apd.ContractId = 2
AND UserId = 8
where c.IdGeographie = dg.IdGeographie
)
GROUP BY newDate,
IdFluide,
IdUsage
The table fait.Consommation
contain around 4.5M rows.
The table geo.DimGeographie
contain around 20K rows.
The table geo.ActivePerimeterDetail
contain around 43K rows.
I have clustered and non-clustered index in all tables.
I tried to update statistics and I saw many articles talking about such problems, but trying all what they suggest, doesn't solve my problem.
update statistics [fait].[Consommation] with fullscan
update statistics [geo].[DimGeographie] with fullscan
update statistics [geo].[ActivePerimeterDetail] with fullscan
update statistics [fait].[Consommation] [IX_GeographicalConsumption] with fullscan
update statistics geo.DimGeographie [IX_Test] with fullscan
update statistics geo.ActivePerimeterDetail [UQ_ActivePerimeterDetail] with fullscan
When I add this code to tables in the query, I have a new execution plan and speed up the query, now the query execute in one second but still have same warning WITH(FORCESCAN, INDEX(0))
Indexes : Edit
- Table
[fait].[Consommation]
NONCLUSTERED INDEX :
CREATE NONCLUSTERED INDEX [IX_GeographicalConsumption] ON [fait].[Consommation]
(
[IdSource] ASC,
[IdTypeFluide] ASC,
[IdUsage] ASC,
[IdFluide] ASC,
[NewDate] ASC
)
INCLUDE (
[Conso-KW],
[Conso-M3],
[Euros],
[CO2],
[Conso-EP],
[IdGeographie]
)
WHERE ([IdUsage]>(0)) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CLUSTERED COLUMNSTORE INDEX:
CREATE CLUSTERED COLUMNSTORE INDEX [IX_Consommation] ON [fait].[Consommation] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
- Table
[geo].[DimGeographie]
NONCLUSTERED INDEX :
CREATE NONCLUSTERED INDEX [IX_Test] ON [geo].[DimGeographie]
(
[ContractId] ASC,
[ModeGeo] ASC
)
INCLUDE (
[IdGeographie],
[IdNiveau3]
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CLUSTERED COLUMNSTORE INDEX:
CREATE CLUSTERED COLUMNSTORE INDEX [IX_DimGeographie] ON [geo].[DimGeographie] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
- Table
[geo].[ActivePerimeterDetail]
NONCLUSTERED INDEX :
CREATE UNIQUE NONCLUSTERED INDEX [UQ_ActivePerimeterDetail] ON [geo].[ActivePerimeterDetail]
(
[ContractId] ASC,
[UserId] ASC,
[PerimeterLevelId] ASC,
[PerimeterId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
UNIQUE NONCLUSTERED INDEX:
CREATE UNIQUE NONCLUSTERED INDEX [UQ_ActivePerimeterDetail] ON [geo].[ActivePerimeterDetail]
(
[ContractId] ASC,
[UserId] ASC,
[PerimeterLevelId] ASC,
[PerimeterId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Best Answer
Once you try to update Statistics with FullScan
Example,
Then avoid using
inequality (<>)
inpredicate
.Use any alternative.BTW, what can be all possible value of
IdUsage
?If columns of
geo.DimGeographie dg
andgeo.ActivePerimeterDetail apd
are not require in resultset then useEXISTS
.This guarantee to return only require number of row from join.
In your index most selective column should be on left most.
Similarly in your query, write most selective predicate first. In some cases it help.
Thing is there is huge difference between
Estimated rows
andActual rows
in below objectThis happen because either Statistics is not updated or skewed data distribution of certain index.
New Edit : Good part about table
[fait].[Consommation]
is that all columns areINT
,DateTime
type.Bad part is there are so many composite keys which define the relation between tables.It is not clear which column is actually PK and how CI is populated.
There should be one column which is
identity
int (or ever increasing) Clustered Index.It is important to know,
data distribution
,selectivity
of each index columns of each table.Then only one can decide about What index is to be created and what should be the order in composite index.Alter your
Composite Nonclustered Index
toFiltered Index
onIdUsage > 0
.I am not sure about Order of column in index list because I am not aware about their
Selectivity
.Notice : Column which participate in
Predicate
are in index list, they are not inInclude
list.Like index
[IX_Test] ON [geo].[DimGeographie]
is wrong.Again Ordering you decide or explain.
Conclusion in same order :
1 ) Determine if number of rows return by each Table is correct or not.If not then correct Query/ where condition .It is not about Index.
2) Right Index Plan
3) Updated Statistics
4) Memory Grant for query is based on Estimated number of Rows.Your estimated rows is less so less memory is granted.This is not enough memory to work on Actual number of Rows which is quite huge. So it spill into TempDB.
In your case there is single level spill.This spill level can increase as
Actual rows
increase andgrant memory
decrease.So Point 1) and 2) are more important.
Lastly,
It is clear from latest Execution Plan that cost has only increase after
WITH(FORCESCAN, INDEX(0))
.So Index hint is not require.