Sql-server – Operator used tempdb to spill data . The estimated row are less than real rows

sql serversql-server-2017t-sql

I have a query that takes around 32 seconds, I tried many ways to fix the performance problem.

The first thing I tried is adding a new column that contain date and time as an integer. After I added this column to a non clustered index, the query now is executed in 2 seconds. but when I look at the execution plan, I see that there are some warnings.

Edit :

DECLARE @StartDate int=2010010110,--2010-01-01 10:00
        @EndDate int=2019101011 --2019-10-10 11:00

SELECT newDate, 
    IdFluide
    IdUsage,       
   SUM(c.[Conso-KW]) AS ConsoKW
FROM fait.Consommation  AS c WITH(FORCESCAN, INDEX(0)) 
where  newDate >= @StartDate 
  AND newDate <= @EndDate
  AND  IdSource = 1
  AND IdTypeFluide = 1

 AND IdUsage > 0

AND exists(select 1 from  geo.DimGeographie dg  
       INNER JOIN geo.ActivePerimeterDetail apd 
              ON dg.ContractId = apd.ContractId 
              AND dg.IdNiveau3=apd.PerimeterId 
              AND apd.PerimeterLevelId=3
              AND dg.ContractId = 2  
              AND dg.ModeGeo='PA' 
              AND apd.ContractId = 2 
              AND UserId = 8
       where c.IdGeographie = dg.IdGeographie  
     )    
GROUP BY newDate,
     IdFluide,
     IdUsage

The table fait.Consommation contain around 4.5M rows.

The table geo.DimGeographie contain around 20K rows.

The table geo.ActivePerimeterDetail contain around 43K rows.

I have clustered and non-clustered index in all tables.
![![enter image description here
![enter image description here

I tried to update statistics and I saw many articles talking about such problems, but trying all what they suggest, doesn't solve my problem.

 update statistics [fait].[Consommation] with fullscan

 update statistics [geo].[DimGeographie] with fullscan

 update statistics [geo].[ActivePerimeterDetail] with fullscan

 update statistics [fait].[Consommation] [IX_GeographicalConsumption] with fullscan

 update statistics geo.DimGeographie [IX_Test] with fullscan

 update statistics geo.ActivePerimeterDetail [UQ_ActivePerimeterDetail] with fullscan

When I add this code to tables in the query, I have a new execution plan and speed up the query, now the query execute in one second but still have same warning WITH(FORCESCAN, INDEX(0))

XML Execution plan

Indexes : Edit

  1. Table [fait].[Consommation]

NONCLUSTERED INDEX :

    CREATE NONCLUSTERED INDEX [IX_GeographicalConsumption] ON [fait].[Consommation]
    (
      [IdSource] ASC,
      [IdTypeFluide] ASC,
      [IdUsage] ASC,
      [IdFluide] ASC,
      [NewDate] ASC
    )
    INCLUDE (   
              [Conso-KW],
              [Conso-M3],
              [Euros],
              [CO2],
              [Conso-EP],
              [IdGeographie]
            ) 
    WHERE ([IdUsage]>(0)) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CLUSTERED COLUMNSTORE INDEX:

 CREATE CLUSTERED COLUMNSTORE INDEX [IX_Consommation] ON [fait].[Consommation] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
  1. Table [geo].[DimGeographie]

NONCLUSTERED INDEX :

     CREATE NONCLUSTERED INDEX [IX_Test] ON [geo].[DimGeographie]
     (
         [ContractId] ASC,
         [ModeGeo] ASC
     )
     INCLUDE (  
                 [IdGeographie],
                 [IdNiveau3]
             ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CLUSTERED COLUMNSTORE INDEX:

  CREATE CLUSTERED COLUMNSTORE INDEX [IX_DimGeographie] ON [geo].[DimGeographie] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
  1. Table [geo].[ActivePerimeterDetail]

NONCLUSTERED INDEX :

    CREATE UNIQUE NONCLUSTERED INDEX [UQ_ActivePerimeterDetail] ON [geo].[ActivePerimeterDetail]
    (
         [ContractId] ASC,
         [UserId] ASC,
         [PerimeterLevelId] ASC,
         [PerimeterId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

UNIQUE NONCLUSTERED INDEX:

  CREATE UNIQUE NONCLUSTERED INDEX [UQ_ActivePerimeterDetail] ON [geo].[ActivePerimeterDetail]
  (
     [ContractId] ASC,
     [UserId] ASC,
     [PerimeterLevelId] ASC,
     [PerimeterId] ASC
  )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Best Answer

Once you try to update Statistics with FullScan

Example,

Update STATISTICS [fait].[Consommation] IX_GeographicalConsumption WITH FULLSCAN

Then avoid using inequality (<>) in predicate.Use any alternative.

BTW, what can be all possible value of IdUsage ?

If columns of geo.DimGeographie dg and geo.ActivePerimeterDetail apd are not require in resultset then use EXISTS.

This guarantee to return only require number of row from join.

 SELECT IdFluide,
       newDate,
       IdUsage,
       SUM(c.[Conso-KW]) AS ConsoKW
FROM fait.Consommation  AS c 
where 
   IdSource = 1
  AND IdTypeFluide = 1
  AND UserId = 8
 AND IdUsage > 0-- remove <>
  AND newDate >= @StartDate 
  AND newDate <= @EndDate)
and exists(select 1 from  geo.DimGeographie dg  
INNER JOIN geo.ActivePerimeterDetail apd 
      ON dg.ContractId = apd.ContractId 
      AND dg.IdNiveau3=apd.PerimeterId 
      AND apd.PerimeterLevelId=3
      AND dg.ContractId = 2  
and dg.ModeGeo='PA' 
AND apd.ContractId = 2
where c.IdGeographie = dg.IdGeographie)        
GROUP BY IdFluide,
  newDate,IdUsage

In your index most selective column should be on left most.

Similarly in your query, write most selective predicate first. In some cases it help.

Thing is there is huge difference between Estimated rows and Actual rows in below object

  1. IX_GeographicalConsumption
  2. IX_Test

This happen because either Statistics is not updated or skewed data distribution of certain index.

New Edit : Good part about table [fait].[Consommation] is that all columns are INT ,DateTime type.

Bad part is there are so many composite keys which define the relation between tables.It is not clear which column is actually PK and how CI is populated.

There should be one column which is identity int (or ever increasing) Clustered Index.

It is important to know,data distribution,selectivity of each index columns of each table.Then only one can decide about What index is to be created and what should be the order in composite index.

Alter your Composite Nonclustered Index to Filtered Index on IdUsage > 0.

CREATE NONCLUSTERED INDEX IX_GeographicalConsumption
ON [fait].[Consommation] (IdSource,IdTypeFluide,IdUsage,IdGeographie,IdFluide,newDate) include(Conso-KW)
where IdUsage>0
GO

I am not sure about Order of column in index list because I am not aware about their Selectivity.

Notice : Column which participate in Predicate are in index list, they are not in Include list.

Like index [IX_Test] ON [geo].[DimGeographie] is wrong.

CREATE NONCLUSTERED INDEX [IX_Test] ON [geo].[DimGeographie]
     (
         [ContractId] ASC,
         [ModeGeo] ASC,
         [IdGeographie],
         [IdNiveau3]
     )

Again Ordering you decide or explain.

Conclusion in same order :

1 ) Determine if number of rows return by each Table is correct or not.If not then correct Query/ where condition .It is not about Index.

2) Right Index Plan

3) Updated Statistics

4) Memory Grant for query is based on Estimated number of Rows.Your estimated rows is less so less memory is granted.This is not enough memory to work on Actual number of Rows which is quite huge. So it spill into TempDB.

In your case there is single level spill.This spill level can increase as Actual rows increase and grant memory decrease.

So Point 1) and 2) are more important.

Lastly,

It is clear from latest Execution Plan that cost has only increase after WITH(FORCESCAN, INDEX(0)) .So Index hint is not require.