Sql-server – Operator used tempdb to spill data . The estimated row are less than real rows

sql serversql-server-2017t-sql

I have a query that takes around 32 seconds, I tried many ways to fix the performance problem.

The first thing I tried is adding a new column that contain date and time as an integer. After I added this column to a non clustered index, the query now is executed in 2 seconds. but when I look at the execution plan, I see that there are some warnings.

Edit :

DECLARE @StartDate int=2010010110,--2010-01-01 10:00
        @EndDate int=2019101011 --2019-10-10 11:00

SELECT newDate, 
    IdFluide
    IdUsage,       
   SUM(c.[Conso-KW]) AS ConsoKW
FROM fait.Consommation  AS c WITH(FORCESCAN, INDEX(0)) 
where  newDate >= @StartDate 
  AND newDate <= @EndDate
  AND  IdSource = 1
  AND IdTypeFluide = 1

 AND IdUsage > 0

AND exists(select 1 from  geo.DimGeographie dg  
       INNER JOIN geo.ActivePerimeterDetail apd 
              ON dg.ContractId = apd.ContractId 
              AND dg.IdNiveau3=apd.PerimeterId 
              AND apd.PerimeterLevelId=3
              AND dg.ContractId = 2  
              AND dg.ModeGeo='PA' 
              AND apd.ContractId = 2 
              AND UserId = 8
       where c.IdGeographie = dg.IdGeographie  
     )    
GROUP BY newDate,
     IdFluide,
     IdUsage

The table fait.Consommation contain around 4.5M rows.

The table geo.DimGeographie contain around 20K rows.

The table geo.ActivePerimeterDetail contain around 43K rows.

I have clustered and non-clustered index in all tables.

I tried to update statistics and I saw many articles talking about such problems, but trying all what they suggest, doesn't solve my problem.

 update statistics [fait].[Consommation] with fullscan

 update statistics [geo].[DimGeographie] with fullscan

 update statistics [geo].[ActivePerimeterDetail] with fullscan

 update statistics [fait].[Consommation] [IX_GeographicalConsumption] with fullscan

 update statistics geo.DimGeographie [IX_Test] with fullscan

 update statistics geo.ActivePerimeterDetail [UQ_ActivePerimeterDetail] with fullscan

When I add this code to tables in the query, I have a new execution plan and speed up the query, now the query execute in one second but still have same warning WITH(FORCESCAN, INDEX(0))

XML Execution plan

Indexes : Edit

Table [fait].[Consommation]

NONCLUSTERED INDEX :

    CREATE NONCLUSTERED INDEX [IX_GeographicalConsumption] ON [fait].[Consommation]
    (
      [IdSource] ASC,
      [IdTypeFluide] ASC,
      [IdUsage] ASC,
      [IdFluide] ASC,
      [NewDate] ASC
    )
    INCLUDE (   
              [Conso-KW],
              [Conso-M3],
              [Euros],
              [CO2],
              [Conso-EP],
              [IdGeographie]
            ) 
    WHERE ([IdUsage]>(0)) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CLUSTERED COLUMNSTORE INDEX:

 CREATE CLUSTERED COLUMNSTORE INDEX [IX_Consommation] ON [fait].[Consommation] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]

Table [geo].[DimGeographie]

NONCLUSTERED INDEX :

     CREATE NONCLUSTERED INDEX [IX_Test] ON [geo].[DimGeographie]
     (
         [ContractId] ASC,
         [ModeGeo] ASC
     )
     INCLUDE (  
                 [IdGeographie],
                 [IdNiveau3]
             ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CLUSTERED COLUMNSTORE INDEX:

  CREATE CLUSTERED COLUMNSTORE INDEX [IX_DimGeographie] ON [geo].[DimGeographie] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]

Table [geo].[ActivePerimeterDetail]

NONCLUSTERED INDEX :

    CREATE UNIQUE NONCLUSTERED INDEX [UQ_ActivePerimeterDetail] ON [geo].[ActivePerimeterDetail]
    (
         [ContractId] ASC,
         [UserId] ASC,
         [PerimeterLevelId] ASC,
         [PerimeterId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

UNIQUE NONCLUSTERED INDEX:

  CREATE UNIQUE NONCLUSTERED INDEX [UQ_ActivePerimeterDetail] ON [geo].[ActivePerimeterDetail]
  (
     [ContractId] ASC,
     [UserId] ASC,
     [PerimeterLevelId] ASC,
     [PerimeterId] ASC
  )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Best Answer

Once you try to update Statistics with FullScan

Example,

Update STATISTICS [fait].[Consommation] IX_GeographicalConsumption WITH FULLSCAN

Then avoid using inequality (<>) in predicate.Use any alternative.

BTW, what can be all possible value of IdUsage ?

If columns of geo.DimGeographie dg and geo.ActivePerimeterDetail apd are not require in resultset then use EXISTS.

This guarantee to return only require number of row from join.

 SELECT IdFluide,
       newDate,
       IdUsage,
       SUM(c.[Conso-KW]) AS ConsoKW
FROM fait.Consommation  AS c 
where 
   IdSource = 1
  AND IdTypeFluide = 1
  AND UserId = 8
 AND IdUsage > 0-- remove <>
  AND newDate >= @StartDate 
  AND newDate <= @EndDate)
and exists(select 1 from  geo.DimGeographie dg  
INNER JOIN geo.ActivePerimeterDetail apd 
      ON dg.ContractId = apd.ContractId 
      AND dg.IdNiveau3=apd.PerimeterId 
      AND apd.PerimeterLevelId=3
      AND dg.ContractId = 2  
and dg.ModeGeo='PA' 
AND apd.ContractId = 2
where c.IdGeographie = dg.IdGeographie)        
GROUP BY IdFluide,
  newDate,IdUsage

In your index most selective column should be on left most.

Similarly in your query, write most selective predicate first. In some cases it help.

Thing is there is huge difference between Estimated rows and Actual rows in below object

IX_GeographicalConsumption
IX_Test

This happen because either Statistics is not updated or skewed data distribution of certain index.

New Edit : Good part about table [fait].[Consommation] is that all columns are INT ,DateTime type.

Bad part is there are so many composite keys which define the relation between tables.It is not clear which column is actually PK and how CI is populated.

There should be one column which is identity int (or ever increasing) Clustered Index.

It is important to know,data distribution,selectivity of each index columns of each table.Then only one can decide about What index is to be created and what should be the order in composite index.

Alter your Composite Nonclustered Index to Filtered Index on IdUsage > 0.

CREATE NONCLUSTERED INDEX IX_GeographicalConsumption
ON [fait].[Consommation] (IdSource,IdTypeFluide,IdUsage,IdGeographie,IdFluide,newDate) include(Conso-KW)
where IdUsage>0
GO

I am not sure about Order of column in index list because I am not aware about their Selectivity.

Notice : Column which participate in Predicate are in index list, they are not in Include list.

Like index [IX_Test] ON [geo].[DimGeographie] is wrong.

CREATE NONCLUSTERED INDEX [IX_Test] ON [geo].[DimGeographie]
     (
         [ContractId] ASC,
         [ModeGeo] ASC,
         [IdGeographie],
         [IdNiveau3]
     )

Again Ordering you decide or explain.

Conclusion in same order :

1 ) Determine if number of rows return by each Table is correct or not.If not then correct Query/ where condition .It is not about Index.

2) Right Index Plan

3) Updated Statistics

4) Memory Grant for query is based on Estimated number of Rows.Your estimated rows is less so less memory is granted.This is not enough memory to work on Actual number of Rows which is quite huge. So it spill into TempDB.

In your case there is single level spill.This spill level can increase as Actual rows increase and grant memory decrease.

So Point 1) and 2) are more important.

Lastly,

It is clear from latest Execution Plan that cost has only increase after WITH(FORCESCAN, INDEX(0)) .So Index hint is not require.

Related Solutions

Sql-server – Differences Between Two Different Create Index Commands

It boils down to looking what the default values are. Lets break this down:

CREATE UNIQUE NONCLUSTERED INDEX [DEID_MAP_IDX1] ON [dbo].[DEID_MAP]

nonclustered is specified here. The default (i.e. nothing specified) is nonclustered. So unless clustered is specified it will default to nonclustered. So that's the same in both scripts.

[dbo] is specified here explicitly. As for the second un-specified CREATE INDEX then it all depends on what the current user's default schema is. Only you can answer that at the moment, so that may or may not default to dbo.

WITH (
    PAD_INDEX  = OFF, 
    STATISTICS_NORECOMPUTE  = OFF, 
    IGNORE_DUP_KEY  = OFF, 
    ALLOW_ROW_LOCKS = ON, 
    ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]

PAD_INDEX: the default is OFF, so unspecified will be the same in the second script as it is in the first.

STATISTICS_NORECOMPUTE: the default is OFF, so the second script unspecified has the same value.

IGNORE_DUP_KEY: the default is OFF, so the second CREATE INDEX is identical with this parameter.

ALLOW_ROW_LOCKS: the default is ON, so the second CREATE script has the same behavior.

ALLOW_PAGE_LOCKS: the default is ON...the second script has identical behavior.

... ON [PRIMARY]: just like the default schema one, this all depends on what your default filegroup is. If PRIMARY is the default filegroup, your second CREATE INDEX script will also create the index on PRIMARY. If PRIMARY is not the default filegroup, then it will be a different filegroup, as an unspecified filegroup will go to the default filegroup.

All of this information and default values can be found on this BOL reference here.

Sql-server – Parent-Child Tree Hierarchical ORDER

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Best Answer

Related Solutions

Sql-server – Differences Between Two Different Create Index Commands

Sql-server – Parent-Child Tree Hierarchical ORDER

Related Question