Sql-server – How to aggregate timeseries several months long – when I only want a portion of each month

aggregatesql server

I have a need to see a trend of data for "month to date" So I want to pull the 1st x number of days of each month for the last 14 months, and the n aggregate this.

My data has approximately 10k data points each day

So far I have only been able to figure out how to do this by writing a double while loop – the outer loop counting down the months and the inner loop selecting each day and aggregating the data for the day – then storing it in a temp table.

Once I have stepped through each month and each day I then selet the data from the temp table and aggregate this to give me monthly data summed for each contract type.

For various reasons I need to run my select 4 times for each day.

This all ends up meaning my SQL takes around 120 secs to run. This is sub-optimal – as I am hoping to have this used by SSRS to pull a report on demand when a user wants to see it. Making them wait 2 whole minutes? Not Desirable – especially considering primary target audience is the exec team

Here is my SQL (NB I've changed a couple of table/ column names )

Declare 
    @ReportDate as DATE
    ,@month as DATE
    ,@dayCounter as INT
    ,@monthCounter as INT
    ,@reportDateCurrentMonth as DATE


SET @ReportDate = sysdatetime()
SET @month = @ReportDate
SET @dayCounter = DATEPART(DAY,@ReportDate)
SET @monthCounter = '0'
SET @reportDateCurrentMonth = DATEADD(month,@monthCounter,@ReportDate)

CREATE TABLE #DailyVolumes
    (
       contract_name varchar(50)
       ,Volume INT
       ,date_registered DATETIME
    )

WHILE @month > DATEADD(month,-15,@ReportDate)
    BEGIN

          WHILE @dayCounter > '0'

          BEGIN

                INSERT INTO #DailyVolumes
                SELECT
                CONCAT('PREFIX-',DWRccn.contract_name) AS contract_name
                    ,count(distinct(CONCAT(rtrim(p.xxxx_id), '-', rtrim(r.lis_req_id)))) AS Volume
                    ,a.date_registered
                FROM accession a
                    Left  join value1 P on a.value1_id = p.value1_id
                    left  join [DataWarehouseReporting].[dbo].[DIM_contract_code_name] DWRccn on a.contract_code = DWRccn.contract_code
                where 
                    [date_registered] = @reportDateCurrentMonth
                    AND a.lis_code = 'S'
                    AND visit_type IN ('I', 'E')
                GROUP BY 
                    CONCAT('PREFIX-',DWRccn.contract_name)
                    ,a.date_registered


                -- Set new values on Daily counter & Date to grab
                SET @dayCounter = @dayCounter - '1'
                SET @reportDateCurrentMonth = DATEADD(day,-1,@reportDateCurrentMonth)
          END;

       SET @monthCounter = @monthCounter - '1'
       SET @month = DATEADD(month,-1,@month)
       SET @reportDateCurrentMonth = DATEADD(month,@monthCounter,@ReportDate)
       SET @dayCounter = DATEPART(DAY,@ReportDate)




    END;



-- SUM Daily Data into Monthly Slices
Select 
    sum(Volume) AS Volume
    ,contract_name
    ,DATEADD(MONTH, DATEDIFF(MONTH, 0, date_registered), 0) AS MonthRegistered
FROM #DailyVolumes
group by 
    contract_name
    ,DATEADD(MONTH, DATEDIFF(MONTH, 0, date_registered), 0)
ORDER BY 
    MonthRegistered DESC 
    ,contract_name
-- Clear Temp Table
DROP TABLE #DailyVolumes

I am hoping someone can tell me how to accomlish what I am after… Which is to do away with the loops and have the data aggregate in a single operation

NB I have done my aggregation for full month periods – that was a piece of cake compared to this

Best Answer

NOTE: I'm assuming you're running SQL Server 2012 (or higher), otherwise the datefromparts() code will need to be modified.

I'd opt for getting rid of the loops in favor of a single query that includes a date generator, which is relatively easy to do with a common table expression (CTE), eg:

declare @days   tinyint,         -- first X days of a month to query
        @months smallint,        -- go back Y months for our query
        @today  date

select  @days   = 5              -- only interested in days 1-5 of each month
        @months = 4              -- only interested in this month plus the previous 4 months
        @today  = sysdatetime();

with
dategen as
(select -- piece our years/months/days together into dates
        datefromparts ( year(dateadd(m,-m.number,@today)),  -- get year  for @today minus m.number of months
                       month(dateadd(m,-m.number,@today)),  -- get month for @today minus m.number of months
                       d.number) as search_date             -- day of month

 from   master..spt_values m,
        master..spt_values d

 -- no join clause => cartesian product (or you could add a cross join, your call)
 where  m.type = 'P'
 and    m.number between 0 and @months
 and    d.type = 'P'
 and    d.number between 1 and @days)

select  search_date
from    dategen
where   search_date <= @today
order by 1

 | search_date         |  @today = 18 November 2017
 | ------------------- |
 | 01/07/2017 00:00:00 |  first 5 days of July 2017
 | 02/07/2017 00:00:00 |
 | 03/07/2017 00:00:00 |
 | 04/07/2017 00:00:00 |
 | 05/07/2017 00:00:00 |

 | 01/08/2017 00:00:00 |  first 5 days of August 2017
 | 02/08/2017 00:00:00 |
 | 03/08/2017 00:00:00 |
 | 04/08/2017 00:00:00 |
 | 05/08/2017 00:00:00 |

 | 01/09/2017 00:00:00 |  first 5 days of September 2017
 | 02/09/2017 00:00:00 |
 | 03/09/2017 00:00:00 |
 | 04/09/2017 00:00:00 |
 | 05/09/2017 00:00:00 |

 | 01/10/2017 00:00:00 |  first 5 days of October 2017
 | 02/10/2017 00:00:00 |
 | 03/10/2017 00:00:00 |
 | 04/10/2017 00:00:00 |
 | 05/10/2017 00:00:00 |

 | 01/11/2017 00:00:00 |  first 5 days of November 2017
 | 02/11/2017 00:00:00 |
 | 03/11/2017 00:00:00 |
 | 04/11/2017 00:00:00 |
 | 05/11/2017 00:00:00 |

Here's a dbfiddle for the (above) CTE example.

Plugging the CTE into your insert/select would generate something like:

-- declare/select @variables;

with
dategen as
(select datefromparts( year(dateadd(m,-m.number,@today)), 
                      month(dateadd(m,-m.number,@today)),
                      d.number) as search_date

 from   master..spt_values m,
        master..spt_values d

 where  m.type = 'P'
 and    m.number between 0 and @months
 and    d.type = 'P'
 and    d.number between 1 and @days)

INSERT INTO #DailyVolumes
SELECT CONCAT('PREFIX-',DWRccn.contract_name)                              AS contract_name
      ,count(distinct(CONCAT(rtrim(p.xxxx_id), '-', rtrim(r.lis_req_id)))) AS Volume
      ,a.date_registered

FROM   accession a

join   dategen dg
on     a.date_registered = dg.search_date

Left 
join   value1 P
on     a.value1_id = p.value1_id

left  
join   [DataWarehouseReporting].[dbo].[DIM_contract_code_name] DWRccn 
on     a.contract_code = DWRccn.contract_code

where  
AND    a.lis_code = 'S'
AND    visit_type IN ('I', 'E')
AND    dg.search_date <= @today

GROUP BY CONCAT('PREFIX-',DWRccn.contract_name)
        ,a.date_registered

NOTES:

  • rename local variables as you see fit
  • set the @months/@days values to your desired limits
  • I haven't tested the insert/select (don't have the DDL for your system) so there may be some minor syntax issues ... ??
  • if performance is still not quite up to what you're expecting then it'll be necessary to review the query plan (and potentially the available indexes)

Assuming the (above) CTE/insert/select query generates the desired results, the next step would be to eliminate #DailyVolumes and add the final select/sum(Volume) code to the above query ... easiest to understand would probably be something like:

with

dategen as
(select ...),

dailyvolumes as
(select ... from accession/dategen/value1/DWRcnn ...)

select sum(Volume) ...
from   dailyvolumes ...