Sql-server – How to sum data over 3 month ranges

sql serversql-server-2008

I have a report where I need to add totals for a column and group them by company selected. I have to do this in T-SQL for 4 ranges:
0-3 Months, 3-6 months, 6-12 months, >12 months
I am confused about how this would be implemented in T-SQL.
There is a balance column on which I have to total.
The problem I am not able to understand is summary over those date ranges. How can this be done in T-SQL?

The user should input the company choice and a date and based on it, I have to generate summary of balance column within the ranges specified, which were
0-3 Months, 3-6 months, 6-12 months, >12 months

I am thinking of it this way: I am thinking of it this way:
Make ranges of date, so : suppose @date is user selected date
then:

DECLARE @daterange1start DATE = @date
DECLARE @daterange1end DATE = (SELECT DATEADD(m,3,@daterange1start))
DECLARE @daterange2start DATE = @daterange1end
DECLARE @daterange2end DATE = (SELECT DATEADD(m,3,@daterange2start))
DECLARE @daterange3start DATE = @daterange2end
DECLARE @daterange3end DATE = (SELECT DATEADD(m,6,@daterange3start)) '

And then using case

SUM (CASE WHEN  dueDate BETWEEN @daterange1start AND @daterange1end THEN    Balance ELSE 0 END) AS '0-3 Months', 
SUM (CASE WHEN dueDate BETWEEN @daterange2start AND @daterange2end THEN Balance ELSE 0 END) AS '3-6 Months',
SUM (CASE WHEN dueDate BETWEEN @daterange3start AND @daterange3end THEN Balance ELSE 0 END) AS '6-12 Months''

Am I thinking of it the right way in terms of a correct solution?

Thank you.

Best Answer

You could categorise your dueDate values and then use PIVOT like this:

WITH categorised AS (
  SELECT
    Balance,
    Range = CASE
      WHEN dueDate < DATEADD(MONTH,  3, @Date) THEN '0-3 Months'
      WHEN dueDate < DATEADD(MONTH,  6, @Date) THEN '3-6 Months'
      WHEN dueDate < DATEADD(MONTH, 12, @Date) THEN '6-12 Months'
      ELSE '>12 Months'
    END
  FROM
    dbo.YourTable
  WHERE
    Company = @Company
    AND dueDate >= @Date
)
SELECT
  [0-3 Months],
  [3-6 Months],
  [6-12 Months],
  [>12 Months]
FROM
  categorised
PIVOT
  (
    SUM(Balance)
    FOR Range IN ([0-3 Months], [3-6 Months], [6-12 Months], [>12 Months])
  ) AS p
;

Related Solutions

Sql-server – Writing a simple bank schema: How should I keep the balances in sync with their transaction history

I am not familiar with accounting, but I solved some similar problems in inventory-type environments. I store running totals in the same row with the transaction. I am using constraints, so that my data is never wrong even under high concurrency. I have written the following solution back then in 2009::

Calculating running totals is notoriously slow, whether you do it with a cursor or with a triangular join. It is very tempting to denormalize, to store running totals in a column, especially if you select it frequently. However, as usual when you denormalize, you need to guarantee the integrity of your denormalized data. Fortunately, you can guarantee the integrity of running totals with constraints – as long as all your constraints are trusted, all your running totals are correct. Also this way you can easily ensure that the current balance (running totals) is never negative - enforcing by other methods can also be very slow. The following script demonstrates the technique.

CREATE TABLE Data.Inventory(InventoryID INT NOT NULL IDENTITY,
  ItemID INT NOT NULL,
  ChangeDate DATETIME NOT NULL,
  ChangeQty INT NOT NULL,
  TotalQty INT NOT NULL,
  PreviousChangeDate DATETIME NULL,
  PreviousTotalQty INT NULL,
  CONSTRAINT PK_Inventory PRIMARY KEY(ItemID, ChangeDate),
  CONSTRAINT UNQ_Inventory UNIQUE(ItemID, ChangeDate, TotalQty),
  CONSTRAINT UNQ_Inventory_Previous_Columns 
     UNIQUE(ItemID, PreviousChangeDate, PreviousTotalQty),
  CONSTRAINT FK_Inventory_Self FOREIGN KEY(ItemID, PreviousChangeDate, PreviousTotalQty)
    REFERENCES Data.Inventory(ItemID, ChangeDate, TotalQty),
  CONSTRAINT CHK_Inventory_Valid_TotalQty CHECK(
         TotalQty >= 0 
     AND (TotalQty = COALESCE(PreviousTotalQty, 0) + ChangeQty)
  ),
  CONSTRAINT CHK_Inventory_Valid_Dates_Sequence CHECK(PreviousChangeDate < ChangeDate),
  CONSTRAINT CHK_Inventory_Valid_Previous_Columns CHECK(
        (PreviousChangeDate IS NULL AND PreviousTotalQty IS NULL)
     OR (PreviousChangeDate IS NOT NULL AND PreviousTotalQty IS NOT NULL)
  )
);

-- beginning of inventory for item 1
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
VALUES(1, '20090101', 10, 10, NULL, NULL);

-- cannot begin the inventory for the second time for the same item 1
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
VALUES(1, '20090102', 10, 10, NULL, NULL);


Msg 2627, Level 14, State 1, Line 10

Violation of UNIQUE KEY constraint 'UNQ_Inventory_Previous_Columns'. 
Cannot insert duplicate key in object 'Data.Inventory'.

The statement has been terminated.


-- add more
DECLARE @ChangeQty INT;
SET @ChangeQty = 5;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090103', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = 3;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090104', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = -4;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090105', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

-- try to violate chronological order
SET @ChangeQty = 5;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20081231', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

Msg 547, Level 16, State 0, Line 4

The INSERT statement conflicted with the CHECK constraint 
"CHK_Inventory_Valid_Dates_Sequence". 
The conflict occurred in database "Test", table "Data.Inventory".

The statement has been terminated.

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- -----
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 5           15          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           18          2009-01-03 00:00:00.000 15
2009-01-05 00:00:00.000 -4          14          2009-01-04 00:00:00.000 18


-- try to change a single row, all updates must fail
UPDATE Data.Inventory SET ChangeQty = ChangeQty + 2 WHERE InventoryID = 3;
UPDATE Data.Inventory SET TotalQty = TotalQty + 2 WHERE InventoryID = 3;

-- try to delete not the last row, all deletes must fail
DELETE FROM Data.Inventory WHERE InventoryID = 1;
DELETE FROM Data.Inventory WHERE InventoryID = 3;

-- the right way to update
DECLARE @IncreaseQty INT;

SET @IncreaseQty = 2;

UPDATE Data.Inventory 
SET 
     ChangeQty = ChangeQty 
   + CASE 
        WHEN ItemID = 1 AND ChangeDate = '20090103' 
        THEN @IncreaseQty 
        ELSE 0 
     END,
  TotalQty = TotalQty + @IncreaseQty,
  PreviousTotalQty = PreviousTotalQty + 
     CASE 
        WHEN ItemID = 1 AND ChangeDate = '20090103' 
        THEN 0 
        ELSE @IncreaseQty 
     END
WHERE ItemID = 1 AND ChangeDate >= '20090103';

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- ----------------
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 7           17          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           20          2009-01-03 00:00:00.000 17
2009-01-05 00:00:00.000 -4          16          2009-01-04 00:00:00.000 20

Sql-server – Most Efficient Way to Retrieve Date Ranges

This is a hard problem to solve in general, but there are a couple of things we can do to help the optimizer choose a plan. This script creates a table with 10,000 rows with a known pseudo-random distribution of rows to illustrate:

CREATE TABLE dbo.SomeDateTable
(
    Id          INTEGER IDENTITY(1, 1) PRIMARY KEY NOT NULL,
    StartDate   DATETIME NOT NULL,
    EndDate     DATETIME NOT NULL
);
GO
SET STATISTICS XML OFF
SET NOCOUNT ON;
DECLARE
    @i  INTEGER = 1,
    @s  FLOAT = RAND(20120104),
    @e  FLOAT = RAND();

WHILE @i <= 10000
BEGIN
    INSERT dbo.SomeDateTable
        (
        StartDate, 
        EndDate
        )
    VALUES
        (
        DATEADD(DAY, @s * 365, {d '2009-01-01'}),
        DATEADD(DAY, @s * 365 + @e * 14, {d '2009-01-01'})
        )

    SELECT
        @s = RAND(),
        @e = RAND(),
        @i += 1
END

The first question is how to index this table. One option is to provide two indexes on the DATETIME columns, so the optimizer can at least choose whether to seek on StartDate or EndDate.

CREATE INDEX nc1 ON dbo.SomeDateTable (StartDate, EndDate)
CREATE INDEX nc2 ON dbo.SomeDateTable (EndDate, StartDate)

Naturally, the inequalities on both StartDate and EndDate mean that only one column in each index can support a seek in the example query, but this is about the best we can do. We might consider making the second column in each index an INCLUDE rather than a key, but we might have other queries that can perform an equality seek on the leading column and an inequality seek on the second column. Also, we may get better statistics this way. Anyway...

DECLARE
    @StartDateBegin DATETIME = {d '2009-08-01'},
    @StartDateEnd DATETIME = {d '2009-10-15'},
    @EndDateBegin DATETIME = {d '2009-08-05'},
    @EndDateEnd DATETIME = {d '2009-10-22'}

SELECT
    COUNT_BIG(*)
FROM dbo.SomeDateTable AS sdt
WHERE
    sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
    AND sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd

This query uses variables, so in general the optimizer will guess at selectivity and distribution, resulting in a guessed cardinality estimate of 81 rows. In fact, the query produces 2076 rows, a discrepancy that might be important in a more complex example.

On SQL Server 2008 SP1 CU5 or later (or R2 RTM CU1) we can take advantage of the Parameter Embedding Optimization to get better estimates, simply by adding OPTION (RECOMPILE) to the SELECT query above. This causes a compilation just before the batch executes, allowing SQL Server to 'see' the real parameter values and optimize for those. With this change, the estimate improves to 468 rows (though you do need to check the runtime plan to see this). This estimate is better than 81 rows, but still not all that close. The modelling extensions enabled by trace flag 2301 may help in some cases, but not with this query.

The problem is where the rows qualified by the two range searches overlap. One of the simplifying assumptions made in the optimizer's costing and cardinality estimation component is that predicates are independent (so if both have a selectivity of 50%, the result of applying both is assumed to qualify 50% of 50% = 25% of the rows). Where this sort of correlation is a problem, we can often work around it with multi-column and/or filtered statistics. With two ranges with unknown start and end points, this becomes impractical. This is where we sometimes have to resort to rewriting the query to a form that happens to produce a better estimate:

SELECT COUNT(*) FROM
(
    SELECT
        sdt.Id
    FROM dbo.SomeDateTable AS sdt
    WHERE 
        sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
    INTERSECT
    SELECT
        sdt.Id
    FROM dbo.SomeDateTable AS sdt 
    WHERE
        sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
) AS intersected (id)
OPTION (RECOMPILE)

This form happens to produce a runtime estimate of 2110 rows (versus 2076 actual). Unless you have TF 2301 on, in which case the more advanced modelling techniques see through the trick and produce exactly the same estimate as before: 468 rows.

One day SQL Server might gain native support for intervals. If that comes with good statistical support, developers might dread tuning query plans like this a little less.

Best Answer

Related Solutions

Sql-server – Writing a simple bank schema: How should I keep the balances in sync with their transaction history

Sql-server – Most Efficient Way to Retrieve Date Ranges

Related Question