Sql-server – Comparing running total with group by

sql server

working with Two scenarios

CREATE TABLE [dbo].[Test](
    [LineID] [int] NULL,
    [BusinessUnit] [nvarchar](50) NULL,
    [Value] [nvarchar](50) NULL,
    [Amount] [money] NULL
) ON [PRIMARY]
GO


INSERT INTO dbo.Test
SELECT 1, 'ABC', 'Apple', 20.00 UNION
SELECT 2, 'DEF', 'Apple', 40.00 UNION
SELECT 3, 'ABC', 'Apple', -20.00 UNION
SELECT 4, 'DEF', 'Apple', -40.00 

above scenario is Invalid — and i need to get line 1, 2, 3, 4 as output because the amount total is not Zero for same business unit as to follow the sequence.
ABC should adjust to zero (Sum of Amount) in the order of InvoiceLine, before new business unit line being added.
So DEF(Line 2) started before ABC business unit's lines turned to zero.


INSERT INTO dbo.Test
SELECT 1, 'ABC', 'Apple', 20.00 UNION
SELECT 2, 'ABC', 'Apple', -20.00 UNION
SELECT 3, 'DEF', 'Apple', 40.00 UNION
SELECT 4, 'DEF', 'Apple', -40.00 

this is Valid, and the code should not return any value as this is valid entry.
DEF (line 3) started with ABC's amount total equals to zero.

Best Answer

Answer based on clarifications in comments

The question, really, is giving two data scenarios and saying in the first case, all rows should be returned in a select because when you order the rows by lineID and keep track of the cumulative sum of the Amount value then any time the BusinessUnit value changes if the cumulative sum of the Amount value is not zero, return the rows. In the second scenario, the cumulative sum for Amount is zero every time the Business Unit value changes when ordered by LineID - so no row is returned.

There are two ways to do this.

You can write a cursor that will iterate through your rows one by one and use local variables to store values, track the cumulative sum and note which rows fail the check, then somehow store those rows in a temporary table and return the table contents at the end.

However, as someone noted in comments, the strength in a database lies in set operations - dealing with sets of data at a time. So...

The second way to do this is as a "single" statement. This is really an aggregation of multiple statements - but they are all selects. The approach below makes use of the T-SQL lag function, which can read values from prior rows in an ordered result set. This function requires a partition clause - which allows us to create "windows" on our data - but we don't actually need those windows - we are happy to treat the whole dataset as a single window. So I guess ultimately this also processes rows one by one, but using T-SQL's native function rather than writing our own cursor.

Last note before the solution - you have a column [Value] which contains the word "Apple" on every row. It seems irrelevant to the question, so I have ignored it. If this column affects the behviour you are seeking, you will have to adjust the below SQL suitably to deal with your [Value] column.

Here is the solution - including data setup and tear down for each of your two scenarios.

Important! - The lineID value for the failing rows is in the column priorLineID (and not lineID)

This is because we are "reading behind" with the lag function - so we don't know if the businessUnit has changed until we get to the next row and look back at the prior one. At that time we know the businessUnit changed so we test whether the prior cumulative total was zero and if not, return the current row and provide the priorLineID in its own column. You can expand the SQL to return whatever prior row data values you need.

delete from test;
go

-- TEST CASE 1 - all rows returned because the cumulative total for Amount is not zero 
--  when BusinessUnit changes, when ordered by LineID 

INSERT INTO dbo.Test
SELECT 1, 'ABC', 'Apple', 20.00 UNION
SELECT 2, 'DEF', 'Apple', 40.00 UNION
SELECT 3, 'ABC', 'Apple', -20.00 UNION
SELECT 4, 'DEF', 'Apple', -40.00 

select * from test;

with EXPANDED_DATA as (
select lineID, BusinessUnit, value, amount from test 
union select 999999999,'','',0
),
PARTITIONED_DATA as (
select 
lineID,
BusinessUnit,
AMount,
lag(BusinessUnit,1,'') over (partition by 1 order by lineID) as priorBusinessUnit,
lag(lineID) over (partition by 1 order by lineID) as priorLineID,
lag(amount) over (partition by 1 order by lineID) as priorAmount
 from EXPANDED_DATA
)
,
WITH_PRIOR_CUMULATIVE_AMOUNT as (
select *,
case when priorBusinessUnit = businessUnit then priorAmount + amount else amount end as CumulativeBusinessUnitTotal
from PARTITIONED_DATA
),
WITH_PRIOR_TOTALS as (
select *, 
lag(CumulativeBusinessUnitTotal) over (partition by 1 order by lineID) as priorCumulativeTotal
 from WITH_PRIOR_CUMULATIVE_AMOUNT 
)
select * from WITH_PRIOR_TOTALS
where BusinessUnit <> priorBusinessUnit and priorCumulativeTotal <> 0



delete from test;
go

-- TEST CASE 2 - no rows returned because the cumulative total for Amount is zero 
--  when BusinessUnit changes, when ordered by LineID

INSERT INTO dbo.Test
SELECT 1, 'ABC', 'Apple', 20.00 UNION
SELECT 2, 'ABC', 'Apple', -20.00 UNION
SELECT 3, 'DEF', 'Apple', 40.00 UNION
SELECT 4, 'DEF', 'Apple', -40.00 

select * from test;

with EXPANDED_DATA as (
select lineID, BusinessUnit, value, amount from test 
union select 999999999,'','',0
),
PARTITIONED_DATA as (
select 
lineID,
BusinessUnit,
AMount,
lag(BusinessUnit,1,'') over (partition by 1 order by lineID) as priorBusinessUnit,
lag(lineID) over (partition by 1 order by lineID) as priorLineID,
lag(amount) over (partition by 1 order by lineID) as priorAmount
 from EXPANDED_DATA

)
,
WITH_PRIOR_CUMULATIVE_AMOUNT as (
select *,
case when priorBusinessUnit = businessUnit then priorAmount + amount else amount end as CumulativeBusinessUnitTotal
from PARTITIONED_DATA
),
WITH_PRIOR_TOTALS as (
select *, 
lag(CumulativeBusinessUnitTotal) over (partition by 1 order by lineID) as priorCumulativeTotal
 from WITH_PRIOR_CUMULATIVE_AMOUNT 
)
select * from WITH_PRIOR_TOTALS
where BusinessUnit <> priorBusinessUnit and priorCumulativeTotal <> 0

Original answer below

(Originally the question seemed to be about validating the insert statements at the time of insertion and I basically made the comment that you really can't do that. I will leave this part of the answer below).

If I understand your question correctly, you are saying the first attempt at insert is invalid because of the sequence of the select statements fails a business rule, and that the second attempt at insert is valid because the sequence of select statements passes that business rule. Is this correct?

If so, note that your select statements have been "joined together" (effectively), using union statements. This means your selection of four rows of data is carried out as a single statement - and there are no guarantees about the order in which those select statements will be processed.

The only difference between your first and second insert - as far as the database is concerned - is that the lineID value varies between the two statements, for a given combination of data.

However the more important implication to you is that it seems you want the database to validate something that it is not designed to validate - re-read my comment that there is no real difference between those two inserts, as far as the database is concerned.

Reading between the lines, I am wondering if your list of select statements is being generated by application code? If so, I would suggest your application should be validating the values it is appending to the query. That said, regardless of the order of rows, the net result (as far as the database is concerned) will be the same (notwithstanding the lineID difference).