The issue of records arriving late makes duplicate-removal more complex, but it is not impossible. Using a view (as proposed in your other question) to dynamically remove duplicates is workable, but queries against that view can produce complex and/or inefficient query plans.
An alternative design is to keep duplicated records in a separate table, in case they are needed to properly process a future late-arriving record. This does add a little complexity to the data import process, but each step is not too hard, and the result is a nice clean duplicate-free History table:
Tables
-- Original table
CREATE TABLE dbo.History
(
Effective datetime NOT NULL, -- When the value became current
Product integer NOT NULL, -- The product
Kind tinyint NOT NULL, -- The price kind (RRP, special, etc)
Price smallmoney NOT NULL, -- The new price
CONSTRAINT [PK dbo.History Effective, Product, Kind]
PRIMARY KEY CLUSTERED
(Effective, Product, Kind)
);
-- Holding area for duplicates that may be needed in future
CREATE TABLE dbo.HistoryDuplicates
(
Effective datetime NOT NULL,
Product integer NOT NULL,
Kind tinyint NOT NULL,
Price smallmoney NOT NULL,
CONSTRAINT [PK dbo.HistoryDuplicates Product, Kind, Effective]
PRIMARY KEY CLUSTERED
(Product, Kind, Effective)
);
Initial Data
CREATE TABLE #NewRows
(
Effective datetime NOT NULL,
Product integer NOT NULL,
Kind tinyint NOT NULL,
Price smallmoney NOT NULL,
[Action] char(1) NOT NULL DEFAULT 'X',
);
CREATE UNIQUE CLUSTERED INDEX cuq
ON #NewRows (Product, Kind, Effective);
INSERT #NewRows
(Effective, Product, Kind, Price)
VALUES
('2013-04-23T00:23:00', 1234, 1, 1.00),
('2013-04-24T00:24:00', 1234, 1, 1.00),
('2013-04-25T00:25:00', 1234, 1, 1.50),
('2013-04-25T00:25:00', 1234, 2, 2.00),
('2013-04-26T00:26:00', 1234, 1, 2.00),
('2013-04-27T00:27:00', 1234, 1, 2.00),
('2013-04-28T00:28:00', 1234, 1, 1.00);
The first step is to remove any redundancies in the input data, storing the removed data in the new holding table:
-- Remove redundancies in the input set
-- and save in the duplicates table
DELETE NR
OUTPUT
DELETED.Effective,
DELETED.Product,
DELETED.Kind,
DELETED.Price
INTO dbo.HistoryDuplicates
(Effective, Product, Kind, Price)
FROM #NewRows AS NR
OUTER APPLY
(
SELECT TOP (1)
NR2.Price
FROM #NewRows AS NR2
WHERE
NR2.Product = NR.Product
AND NR2.Kind = NR.Kind
AND NR2.Effective < NR.Effective
ORDER BY
NR2.Effective DESC
) AS X
WHERE
EXISTS (SELECT X.Price INTERSECT SELECT NR.Price);
Classifying input rows
The next step is to decide whether each row in the input table is redundant (w.r.t the History table) or not. The following query sets the Action
column of the input set data appropriately:
-- Decide what to do with each row
UPDATE NewRows
SET [Action] =
CASE
WHEN NOT EXISTS (SELECT ExistingRow.Price INTERSECT SELECT NewRows.Price)
THEN 'I' -- Insert
WHEN ExistingRow.Price = NewRows.Price
THEN 'D' -- Duplicate
ELSE 'X'
END
FROM #NewRows AS NewRows
OUTER APPLY
(
SELECT TOP (1)
H.Price,
H.Effective
FROM dbo.History AS H
WHERE
H.Product = NewRows.Product
AND H.Kind = NewRows.Kind
AND H.Effective <= NewRows.Effective
ORDER BY
H.Effective DESC
) AS ExistingRow;
CREATE UNIQUE CLUSTERED INDEX cuq
ON #NewRows
([Action], Product, Kind, Effective)
WITH (DROP_EXISTING = ON);
Store Redundant Rows
Now we store the rows identified as redundant to the holding table:
-- Store duplicates
WITH Duplicates AS
(
SELECT NR.*
FROM #NewRows AS NR
WHERE NR.[Action] = 'D'
)
MERGE dbo.HistoryDuplicates AS HD
USING Duplicates AS NR
ON NR.Product = HD.Product
AND NR.Kind = HD.Kind
AND NR.Effective = HD.Effective
WHEN NOT MATCHED BY TARGET THEN
INSERT (Product, Kind, Effective, Price)
VALUES (Product, Kind, Effective, Price);
New History rows
The non-redundant rows are added to the History table:
-- Inserts
WITH Inserts AS
(
SELECT NR.*
FROM #NewRows AS NR
WHERE NR.[Action] = 'I'
)
MERGE dbo.History AS H
USING Inserts AS NR
ON NR.Product = H.Product
AND NR.Kind = H.Kind
AND NR.Effective = H.Effective
AND NR.Price = H.Price
WHEN NOT MATCHED BY TARGET THEN
INSERT (Effective, Product, Kind, Price)
VALUES (Effective, Product, Kind, Price);
Reinstating redundant records
Adding new records can result in redundant rows needing to be reinstated. The following query identifies qualifying redundant rows and moves them to the History table:
DELETE NextDuplicate
OUTPUT DELETED.Product,
DELETED.Kind,
DELETED.Price,
DELETED.Effective
INTO dbo.History
(Product, Kind, Price, Effective)
FROM #NewRows AS NR
CROSS APPLY
(
SELECT TOP (1)
H.Effective,
H.Price
FROM dbo.History AS H
WHERE
H.Product = NR.Product
AND H.Kind = NR.Kind
AND H.Effective > NR.Effective
ORDER BY
H.Effective ASC
) AS NextHistory
CROSS APPLY
(
SELECT TOP (1)
HD.Effective,
HD.Product,
HD.Kind,
HD.Price
FROM dbo.HistoryDuplicates AS HD
WHERE
HD.Product = NR.Product
AND HD.Kind = NR.Kind
AND HD.Effective > NR.Effective
AND HD.Effective < NextHistory.Effective
ORDER BY
HD.Effective ASC
) AS NextDuplicate
WHERE
NR.[Action] = 'I'
AND NOT EXISTS (SELECT NextDuplicate.Price INTERSECT SELECT NR.Price);
Results
History table
╔═════════════════════════╦═════════╦══════╦═══════╗
║ Effective ║ Product ║ Kind ║ Price ║
╠═════════════════════════╬═════════╬══════╬═══════╣
║ 2013-04-23 00:23:00.000 ║ 1234 ║ 1 ║ 1.00 ║
║ 2013-04-25 00:25:00.000 ║ 1234 ║ 1 ║ 1.50 ║
║ 2013-04-25 00:25:00.000 ║ 1234 ║ 2 ║ 2.00 ║
║ 2013-04-26 00:26:00.000 ║ 1234 ║ 1 ║ 2.00 ║
║ 2013-04-28 00:28:00.000 ║ 1234 ║ 1 ║ 1.00 ║
╚═════════════════════════╩═════════╩══════╩═══════╝
History Duplicates table
╔═════════════════════════╦═════════╦══════╦═══════╗
║ Effective ║ Product ║ Kind ║ Price ║
╠═════════════════════════╬═════════╬══════╬═══════╣
║ 2013-04-24 00:24:00.000 ║ 1234 ║ 1 ║ 1.00 ║
║ 2013-04-27 00:27:00.000 ║ 1234 ║ 1 ║ 2.00 ║
╚═════════════════════════╩═════════╩══════╩═══════╝
Processing new data
The preceding steps are quite general. We can process a new batch of rows using exactly the same code. The next script loads the input table with two sample rows, one of which is a duplicate, and the other an example of needing to reinstate a previously-redundant row:
-- If needed
DROP TABLE #NewRows;
CREATE TABLE #NewRows
(
Effective datetime NOT NULL,
Product integer NOT NULL,
Kind tinyint NOT NULL,
Price smallmoney NOT NULL,
[Action] char(1) NOT NULL DEFAULT 'X',
);
CREATE UNIQUE CLUSTERED INDEX cuq
ON #NewRows (Product, Kind, Effective);
INSERT #NewRows
(Effective, Product, Kind, Price)
VALUES
('2013-04-24 00:24:01.000', 1234, 1, 1.00), -- New duplicate
('2013-04-26 00:26:30.000', 1234, 1, 5.00); -- Complex new row
Running the rest of the general script produces this final state:
History:
╔═════════════════════════╦═════════╦══════╦═══════╗
║ Effective ║ Product ║ Kind ║ Price ║
╠═════════════════════════╬═════════╬══════╬═══════╣
║ 2013-04-23 00:23:00.000 ║ 1234 ║ 1 ║ 1.00 ║
║ 2013-04-25 00:25:00.000 ║ 1234 ║ 1 ║ 1.50 ║
║ 2013-04-25 00:25:00.000 ║ 1234 ║ 2 ║ 2.00 ║
║ 2013-04-26 00:26:00.000 ║ 1234 ║ 1 ║ 2.00 ║
║ 2013-04-26 00:26:30.000 ║ 1234 ║ 1 ║ 5.00 ║ <- New
║ 2013-04-27 00:27:00.000 ║ 1234 ║ 1 ║ 2.00 ║ <- Recovered
║ 2013-04-28 00:28:00.000 ║ 1234 ║ 1 ║ 1.00 ║
╚═════════════════════════╩═════════╩══════╩═══════╝
History Duplicate:
╔═════════════════════════╦═════════╦══════╦═══════╗
║ Effective ║ Product ║ Kind ║ Price ║
╠═════════════════════════╬═════════╬══════╬═══════╣
║ 2013-04-24 00:24:00.000 ║ 1234 ║ 1 ║ 1.00 ║
║ 2013-04-24 00:24:01.000 ║ 1234 ║ 1 ║ 1.00 ║ <- New duplicate
╚═════════════════════════╩═════════╩══════╩═══════╝
Best Answer
You are in the right track just a few things to keep in mind.
FROM DIMDATE D LEFT JOIN ACNTHEADER A
Which make sure that no days are omitted.
Since you are trying to add the values based on date, you need an Aggregate function in the final columns.
SUM
in your case.To split the columns based on the value you can use the below.
SUM(CASE WHEN VALUE>0 THEN VALUE ELSE 0 END) AS SUMPOSITIVE
Try it and see if face any roadblocks if any Please comment.
ANSWER
I hope your date filter condition is correct, so I am not editing it.
Explanation
Use of
LEFT JOIN
This is to make sure that you need all the values for June month for 2013. If you do an
INNER JOIN
it will only show the values in matching in both table.Doing Pre-Aggregation before doing a
LEFT JOIN
For the first answer, I posted was getting duplicated records. I was in the pre-assumption that the date was joined multiple times and I was wrong.The best way to understand this is by comparing both the result set. Sometimes it hard to visualize so break down your query and see how the
JOIN
works.First Answer
before this will check the one without the filtering clause on this one.
In which we are getting the expected result set, One thing to note is the account was not shown on all rows but only on the valid dates on the right-hand table (#ACNTHEADER) and the requirement is to show account in all rows.CROSS JOIN
Why I used
DISTINCT
inSUB QUERY
forCROSS JOIN
is to minimize the rows to process. As it can lead to multiplying the rows based on the number of rows in each table. ifDISTINCT
is not used this will lead to wrong results as it will have duplicates. Also, be careful when doingCROSS JOIN
as it can be a resource consuming process.Now all we have to do the aggregation on the value that matches with the value in
#ACNTHEADER
for which we use aLEFT JOIN
on the#ACNTHEADER
tableWhich is the required result set. Now, what went wrong on the first query. Even the pre-aggregation is not required! And all this was because of the filter clause on
DATE
value which resulted in duplicates.I am leaving this to the reader to see how the filter works by running the query for one row and see how its getting changed which will be a good exercise for you analyzing SQL code.
NB Also, note I have only considered the June month for 2013, change
dimdate
table according to your need.