I am not familiar with accounting, but I solved some similar problems in inventory-type environments. I store running totals in the same row with the transaction. I am using constraints, so that my data is never wrong even under high concurrency. I have written the following solution back then in 2009::
Calculating running totals is notoriously slow, whether you do it with a cursor or with a triangular join. It is very tempting to denormalize, to store running totals in a column, especially if you select it frequently. However, as usual when you denormalize, you need to guarantee the integrity of your denormalized data. Fortunately, you can guarantee the integrity of running totals with constraints – as long as all your constraints are trusted, all your running totals are correct. Also this way you can easily ensure that the current balance (running totals) is never negative - enforcing by other methods can also be very slow. The following script demonstrates the technique.
CREATE TABLE Data.Inventory(InventoryID INT NOT NULL IDENTITY,
ItemID INT NOT NULL,
ChangeDate DATETIME NOT NULL,
ChangeQty INT NOT NULL,
TotalQty INT NOT NULL,
PreviousChangeDate DATETIME NULL,
PreviousTotalQty INT NULL,
CONSTRAINT PK_Inventory PRIMARY KEY(ItemID, ChangeDate),
CONSTRAINT UNQ_Inventory UNIQUE(ItemID, ChangeDate, TotalQty),
CONSTRAINT UNQ_Inventory_Previous_Columns
UNIQUE(ItemID, PreviousChangeDate, PreviousTotalQty),
CONSTRAINT FK_Inventory_Self FOREIGN KEY(ItemID, PreviousChangeDate, PreviousTotalQty)
REFERENCES Data.Inventory(ItemID, ChangeDate, TotalQty),
CONSTRAINT CHK_Inventory_Valid_TotalQty CHECK(
TotalQty >= 0
AND (TotalQty = COALESCE(PreviousTotalQty, 0) + ChangeQty)
),
CONSTRAINT CHK_Inventory_Valid_Dates_Sequence CHECK(PreviousChangeDate < ChangeDate),
CONSTRAINT CHK_Inventory_Valid_Previous_Columns CHECK(
(PreviousChangeDate IS NULL AND PreviousTotalQty IS NULL)
OR (PreviousChangeDate IS NOT NULL AND PreviousTotalQty IS NOT NULL)
)
);
-- beginning of inventory for item 1
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
VALUES(1, '20090101', 10, 10, NULL, NULL);
-- cannot begin the inventory for the second time for the same item 1
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
VALUES(1, '20090102', 10, 10, NULL, NULL);
Msg 2627, Level 14, State 1, Line 10
Violation of UNIQUE KEY constraint 'UNQ_Inventory_Previous_Columns'.
Cannot insert duplicate key in object 'Data.Inventory'.
The statement has been terminated.
-- add more
DECLARE @ChangeQty INT;
SET @ChangeQty = 5;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20090103', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
SET @ChangeQty = 3;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20090104', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
SET @ChangeQty = -4;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20090105', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
-- try to violate chronological order
SET @ChangeQty = 5;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20081231', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
Msg 547, Level 16, State 0, Line 4
The INSERT statement conflicted with the CHECK constraint
"CHK_Inventory_Valid_Dates_Sequence".
The conflict occurred in database "Test", table "Data.Inventory".
The statement has been terminated.
SELECT ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;
ChangeDate ChangeQty TotalQty PreviousChangeDate PreviousTotalQty
----------------------- ----------- ----------- ----------------------- -----
2009-01-01 00:00:00.000 10 10 NULL NULL
2009-01-03 00:00:00.000 5 15 2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3 18 2009-01-03 00:00:00.000 15
2009-01-05 00:00:00.000 -4 14 2009-01-04 00:00:00.000 18
-- try to change a single row, all updates must fail
UPDATE Data.Inventory SET ChangeQty = ChangeQty + 2 WHERE InventoryID = 3;
UPDATE Data.Inventory SET TotalQty = TotalQty + 2 WHERE InventoryID = 3;
-- try to delete not the last row, all deletes must fail
DELETE FROM Data.Inventory WHERE InventoryID = 1;
DELETE FROM Data.Inventory WHERE InventoryID = 3;
-- the right way to update
DECLARE @IncreaseQty INT;
SET @IncreaseQty = 2;
UPDATE Data.Inventory
SET
ChangeQty = ChangeQty
+ CASE
WHEN ItemID = 1 AND ChangeDate = '20090103'
THEN @IncreaseQty
ELSE 0
END,
TotalQty = TotalQty + @IncreaseQty,
PreviousTotalQty = PreviousTotalQty +
CASE
WHEN ItemID = 1 AND ChangeDate = '20090103'
THEN 0
ELSE @IncreaseQty
END
WHERE ItemID = 1 AND ChangeDate >= '20090103';
SELECT ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;
ChangeDate ChangeQty TotalQty PreviousChangeDate PreviousTotalQty
----------------------- ----------- ----------- ----------------------- ----------------
2009-01-01 00:00:00.000 10 10 NULL NULL
2009-01-03 00:00:00.000 7 17 2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3 20 2009-01-03 00:00:00.000 17
2009-01-05 00:00:00.000 -4 16 2009-01-04 00:00:00.000 20
This is a hard problem to solve in general, but there are a couple of things we can do to help the optimizer choose a plan. This script creates a table with 10,000 rows with a known pseudo-random distribution of rows to illustrate:
CREATE TABLE dbo.SomeDateTable
(
Id INTEGER IDENTITY(1, 1) PRIMARY KEY NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL
);
GO
SET STATISTICS XML OFF
SET NOCOUNT ON;
DECLARE
@i INTEGER = 1,
@s FLOAT = RAND(20120104),
@e FLOAT = RAND();
WHILE @i <= 10000
BEGIN
INSERT dbo.SomeDateTable
(
StartDate,
EndDate
)
VALUES
(
DATEADD(DAY, @s * 365, {d '2009-01-01'}),
DATEADD(DAY, @s * 365 + @e * 14, {d '2009-01-01'})
)
SELECT
@s = RAND(),
@e = RAND(),
@i += 1
END
The first question is how to index this table. One option is to provide two indexes on the DATETIME
columns, so the optimizer can at least choose whether to seek on StartDate
or EndDate
.
CREATE INDEX nc1 ON dbo.SomeDateTable (StartDate, EndDate)
CREATE INDEX nc2 ON dbo.SomeDateTable (EndDate, StartDate)
Naturally, the inequalities on both StartDate
and EndDate
mean that only one column in each index can support a seek in the example query, but this is about the best we can do. We might consider making the second column in each index an INCLUDE
rather than a key, but we might have other queries that can perform an equality seek on the leading column and an inequality seek on the second column. Also, we may get better statistics this way. Anyway...
DECLARE
@StartDateBegin DATETIME = {d '2009-08-01'},
@StartDateEnd DATETIME = {d '2009-10-15'},
@EndDateBegin DATETIME = {d '2009-08-05'},
@EndDateEnd DATETIME = {d '2009-10-22'}
SELECT
COUNT_BIG(*)
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
AND sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
This query uses variables, so in general the optimizer will guess at selectivity and distribution, resulting in a guessed cardinality estimate of 81 rows. In fact, the query produces 2076 rows, a discrepancy that might be important in a more complex example.
On SQL Server 2008 SP1 CU5 or later (or R2 RTM CU1) we can take advantage of the Parameter Embedding Optimization to get better estimates, simply by adding OPTION (RECOMPILE)
to the SELECT
query above. This causes a compilation just before the batch executes, allowing SQL Server to 'see' the real parameter values and optimize for those. With this change, the estimate improves to 468 rows (though you do need to check the runtime plan to see this). This estimate is better than 81 rows, but still not all that close. The modelling extensions enabled by trace flag 2301 may help in some cases, but not with this query.
The problem is where the rows qualified by the two range searches overlap. One of the simplifying assumptions made in the optimizer's costing and cardinality estimation component is that predicates are independent (so if both have a selectivity of 50%, the result of applying both is assumed to qualify 50% of 50% = 25% of the rows). Where this sort of correlation is a problem, we can often work around it with multi-column and/or filtered statistics. With two ranges with unknown start and end points, this becomes impractical. This is where we sometimes have to resort to rewriting the query to a form that happens to produce a better estimate:
SELECT COUNT(*) FROM
(
SELECT
sdt.Id
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.StartDate BETWEEN @StartDateBegin AND @StartDateEnd
INTERSECT
SELECT
sdt.Id
FROM dbo.SomeDateTable AS sdt
WHERE
sdt.EndDate BETWEEN @EndDateBegin AND @EndDateEnd
) AS intersected (id)
OPTION (RECOMPILE)
This form happens to produce a runtime estimate of 2110 rows (versus 2076 actual). Unless you have TF 2301 on, in which case the more advanced modelling techniques see through the trick and produce exactly the same estimate as before: 468 rows.
One day SQL Server might gain native support for intervals. If that comes with good statistical support, developers might dread tuning query plans like this a little less.
Best Answer
You could categorise your
dueDate
values and then use PIVOT like this: