I am not sure what type of performance you are looking for, but if CLR or external app is not an option, a cursor is all that is left. On my aged laptop I get through 1,000,000 rows in about 100 seconds using the following solution. The nice thing about it is that it scales linearly, so I would be looking at a little about 20 minutes to run through the entire thing. With a decent server you will be faster, but not an order of magnitude, so it would still take several minutes to complete this. If this is a one off process, you probably can afford the slowness. If you need to run this as a report or similar regularly, you might want to store the values in the same table un update them as new rows get added, e.g. in a trigger.
Anyway, here is the code:
IF OBJECT_ID('dbo.MyTable') IS NOT NULL DROP TABLE dbo.MyTable;
CREATE TABLE dbo.MyTable(
Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
v NUMERIC(5,3) DEFAULT ABS(CHECKSUM(NEWID())%100)/100.0
);
MERGE dbo.MyTable T
USING (SELECT TOP(1000000) 1 X FROM sys.system_internals_partition_columns A,sys.system_internals_partition_columns B,sys.system_internals_partition_columns C,sys.system_internals_partition_columns D)X
ON(1=0)
WHEN NOT MATCHED THEN
INSERT DEFAULT VALUES;
--SELECT * FROM dbo.MyTable
DECLARE @st DATETIME2 = SYSUTCDATETIME();
DECLARE cur CURSOR FAST_FORWARD FOR
SELECT Id,v FROM dbo.MyTable
ORDER BY Id;
DECLARE @id INT;
DECLARE @v NUMERIC(5,3);
DECLARE @running_total NUMERIC(6,3) = 0;
DECLARE @bucket INT = 1;
CREATE TABLE #t(
id INT PRIMARY KEY CLUSTERED,
v NUMERIC(5,3),
bucket INT,
running_total NUMERIC(6,3)
);
OPEN cur;
WHILE(1=1)
BEGIN
FETCH NEXT FROM cur INTO @id,@v;
IF(@@FETCH_STATUS <> 0) BREAK;
IF(@running_total + @v > 1)
BEGIN
SET @running_total = 0;
SET @bucket += 1;
END;
SET @running_total += @v;
INSERT INTO #t(id,v,bucket,running_total)
VALUES(@id,@v,@bucket, @running_total);
END;
CLOSE cur;
DEALLOCATE cur;
SELECT DATEDIFF(SECOND,@st,SYSUTCDATETIME());
SELECT * FROM #t;
GO
DROP TABLE #t;
It drops and recreates the table MyTable, fills it with 1000000 rows and then goes to work.
The cursor copies each row into a temp table while running the calculations. At the end the select returns the calculated results. You might be a little faster if you don't copy the data around but do an in-place update instead.
If you have an option to upgrade to SQL 2012 you can look at the new window-spool supported moving window aggregates, that should give you better performance.
On a side note, if you have an assembly installed with permission_set=safe, you can do more bad stuff to a server with standard T-SQL than with the assembly, so I would keep working on removing that barrier - You have a good use case here where CLR really would help you.
Count the number of rows for every distribution and compare that to the number of rows returned for a distribution where you apply your filter.
declare @T table
(
Distributor char(3),
Reseller varchar(10),
State char(2)
);
insert into @T values
('ABC', 'BestBuy', 'CA'),
('ABC', 'RadioShack', 'NJ'),
('XYZ', 'BestBuy', 'CA'),
('XYZ', 'RadioShack', 'MO');
select T1.Distributor
from @T as T1
where (
select count(*)
from @T as T2
where T1.Distributor =T2.Distributor
) =
(
select count(*)
from @T as T3
where T1.Distributor =T3.Distributor and
(
T3.Reseller = 'BestBuy' and T3.State='CA' or
T3.Reseller = 'RadioShack' and T3.State='NJ'
)
);
select T1.Distributor
from @T as T1
where (
select count(*)
from @T as T2
where T1.Distributor =T2.Distributor
) =
(
select count(*)
from @T as T3
where T1.Distributor =T3.Distributor and
(
T3.Reseller = 'BestBuy' and T3.State='MO' or
T3.Reseller = 'RadioShack' and T3.State='NJ'
)
);
Best Answer
This seems like a much simpler solution than introducing a bunch of unions: