We have an InventoryActivity
table that holds transactional changes to item quantity:
CREATE TABLE dbo.InventoryActivity(
InventoryActivity_uid int IDENTITY(1,1) NOT NULL PRIMARY KEY,
Organization_uid int NOT NULL,
MasterInventory_uid int NOT NULL,
AdjustmentType_cd varchar(20) NULL,
AdjustmentReason_cd varchar(20) NULL,
Quantity int NULL
)
We would like an InventorySummary
which should aggregate to the current quantity for each. The summary counts should always be derivable from the the sum of the transactional records, but we have several different approaches on how to calculate aggregate counts:
- Stored Procedure
- Separate Summary Table
- Indexed View
What performance considerations should tip the scales in favor of a particular strategy?
What best practices exist? *(I know best practices borders on discussion, but I want to know what considerations would come into play to help make the decision)
Here's a fiddle with some sample code and data
-
Stored Procedure
Simplest option is just perform the SUM operation fresh everytime. But involves no caching and would likely cause performance issues over time.
CREATE PROCEDURE dbo.GetInventorySummary AS SELECT Organization_uid, MasterInventory_uid, SUM(Quantity) AS Quantity FROM dbo.InventoryActivity GROUP BY Organization_uid, MasterInventory_uid
-
Separate Table
We could create a table to store the current quantities. The plus side is fetching this data would be trivial. The downside is we would have to manually maintain it and keep the records in sync everytime we do a write to the InventoryActivity table.
CREATE TABLE dbo.InventorySummary( Organization_uid int NOT NULL, MasterInventory_uid int NOT NULL, Quantity int NOT NULL, PRIMARY KEY (Organization_uid, MasterInventory_uid) )
Triggers could help alleviate some of that maintenance.
CREATE TRIGGER dbo.InventoryActivity_I ON dbo.InventoryActivity AFTER INSERT AS CREATE TABLE #InsertSummaryTemp ( Organization_uid int, MasterInventory_uid int, Quantity int ) INSERT INTO #InsertSummaryTemp SELECT Organization_uid, MasterInventory_uid, SUM(Quantity) AS Quantity FROM INSERTED GROUP BY Organization_uid, MasterInventory_uid -- UPDATE EXISTING RECORDS UPDATE InventorySummary SET Quantity = s.Quantity + i.Quantity FROM InventorySummary s JOIN #InsertSummaryTemp i ON s.Organization_uid = i.Organization_uid AND s.MasterInventory_uid = i.MasterInventory_uid -- INSERT NEW RECORDS INSERT INTO InventorySummary (Organization_uid, MasterInventory_uid, Quantity) SELECT i.Organization_uid, i.MasterInventory_uid, i.Quantity FROM #InsertSummaryTemp i LEFT JOIN InventorySummary s ON i.Organization_uid = s.Organization_uid AND i.MasterInventory_uid = s.MasterInventory_uid WHERE s.MasterInventory_uid IS NULL
-
Indexed View
Borrowed from this and this we could create an
Indexed View
. Which lowers the maintenance cost associated with option 2. However there is a concern about performance that we are still aggregating all records through the entire history. As opposed to a simple read from a table.CREATE VIEW dbo.InventorySummaryView WITH SCHEMABINDING AS SELECT Organization_uid, MasterInventory_uid, SUM(Quantity) AS Quantity, COUNT_BIG(*) AS Count FROM dbo.InventoryActivity GROUP BY Organization_uid, MasterInventory_uid GO CREATE UNIQUE CLUSTERED INDEX PK_InventorySummaryView ON dbo.InventorySummaryView ( Organization_uid, MasterInventory_uid )
Best Answer
Essentially, as any form of cache, these strategies allow to increase performance of reads at the expense of writes and disk space.
So, you should consider the following :
The first strategy is a baseline. You don't duplicate the data, writes are as fast as they can be.
The last strategy (indexed view) slows down writes immediately (as they happen), but the aggregated amounts are always up to date.
The second strategy with an explicit summary table gives you greater control over when to perform the aggregation. You can delay the aggregation and perform it when the server is under less load. If you accumulate a bunch of pending changes for calculating the summary and perform these calculations in bulk, it may be more efficient than updating the summary after every single change of the source data.
On the other hand, the engine that maintains the indexed view during updates should be smart enough to update the summary by applying only the changes to the summary without reading through the whole table each time. For example, if you update only, say, 2 rows in 10M table, then to calculate the new
SUM(Quantity)
the engine can subtract two old values ofQuantity
and add two new values. I don't have an authoritative source at hand that would confirm that the engine indeed works like this, but it should be fairly easy to test and verify by measuring the amount of reads and writes of a few test statements on a large table.This leads to another thing to consider:
SUM
up-to-date efficiently and not so easy forMIN
.