There are a few things you want to factor into your design:
1. Measurements Need a Timestamp
Make sure all of your measurements have an indication of:
- Scalar Value
- Unit of Measurement
- Date and Time the Measurement was Taken
This will allow you to work with measurements that need time-dependent conversion calculations.
2. Units of Measure Have Attributes
Each unit of measure has a few different attributes. The obvious ones are indicative, like a code and maybe a descriptive name. There are also a couple of critical other attributes to keep for each unit of measure. (i) Unit Type and (ii) Conversion Factor to the Base Unit.
The first tells you whether your unit of measure is a length, a weight, energy, power, currency, etc. etc. It should also tell you what the base unit of measure is. You should pick exactly one for each unit type. You can use things like kWh if you like, but I'd stick to the base SI units (as applicable) if I were you.
The second tells you what your unit of measure needs to be multiplied by to get it to the base. I mentioned that this is an attribute of your UOM, but in fact it needs to be in a child table. The business key of the child table that holds this base conversion factor is the combination of the UOM, its base unit type and a date/time. I would keep both an effective and an expiry date/time on the base conversion factor table. This allows you to quickly find the right rate that applies at any particular point in time. If it happens to be a rate that doesn't change, that's OK. Just use a min-collating effective date and a max-collating expiry date for the one record.
3. Trying to Table-Drive Everything Will Make You Nuts
The last piece of the puzzle is determining the calculation for moving from one kind of unit to another kind of unit. You could try to table-drive this kind of calculation but in the end the tricky ones are going to make the design so general (read complicated and slow) that it will be impractical. Instead, create a code-table of conversion calculations and use it to link one kind of Unit Type to another kind of Unit Type. Perform the actual calculations in some code somewhere. Which piece of code you use for any given conversion is what the code table tells you. How the calculation is performed is just in the code. You can have one calculation each for the various easy things, like area needs two lengths and volume needs three lengths as well as the harder ones like work needs energy and time.
When you get the details of your design figured out you should blog it and come back here to post a link!
I'm submitting this as an answer, as I am currently avoiding the out-of-memory type errors as well as have reduced the time of run significantly (was 4+ hours, ending in failure; now is 1.25 hrs, ending in success). However, I'm sure that after about 10 million more records, this problem may reoccur, so I would appreciate any additional comments or answers aimed at making this more memory-efficient for the future.
The "solution" to this point was to drop unneeded fields and especially indices from the design of the temp tables. Additionally, index creation for non-constraint keys was deferred until after the table was filled.
To address the issue (pointed out first by @MartinSmith) of partitioning not matching the later JOIN
, I created a field in a subquery which I then used for both PARTITION
ing and for JOIN
ing.
Code for the Q
set nocount on;
SET XACT_ABORT ON;
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'started' as reason; --last run 2015-12-16 18:22:02
GO
CREATE TABLE ##ttOrder_det (
AnimalID nvarchar(50) not null,
DD DateTime not null,
ReceiverID int NOT NULL,
Location nvarchar(255) NOT NULL,
GLocation nvarchar(255) NULL,
DetID int NOT NULL,
BinIncORNum int NULL, -- first pass=Inc, second pass=Num
CONSTRAINT [PK_ttRDA] PRIMARY KEY CLUSTERED
([AnimalID] ASC, [DD] ASC, ReceiverID ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE UNIQUE INDEX UX_CTR ON ##ttOrder_det (AnimalID ASC, DetID ASC);
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'created first temporary table' as reason; --last run 2015-12-16 18:22:02
GO
WITH s AS (
SELECT
AnimalID, DetectDate,
ReceiverID, Location, GLocation,
ROW_NUMBER() OVER (
PARTITION BY AnimalID ORDER BY DetectDate ASC, ReceiverID ASC
) as DetID
FROM (
SELECT
COALESCE (
TF.AnimalID,
Det.Technology+'-'+cast(Det.XmitID AS nvarchar(10)),
Det.AnimalID,
N'BSVALUE999'
) as AnimalID,
DetectDate,
ReceiverID,
COALESCE (
Det.location,
N'Unknown Location'
) as Location,
COALESCE (
ML.General_Location,
N'Invalid General Location - Orphaned Receiver'
) as GLocation
FROM
Detections_with_Location as Det LEFT OUTER JOIN
Receiver_Locations as ML ON Det.Location=ML.Location LEFT OUTER JOIN
Tagged_Animal as TF ON Det.AnimalID=TF.AnimalID
) AS T
)
INSERT INTO ##ttOrder_det (AnimalID, DD, ReceiverID, Location, GLocation, DetID, BinIncORNum)
SELECT
s1.AnimalID,
s1.DetectDate, s1.ReceiverID, s1.Location, s1.GLocation,
s1.DetID,
CASE WHEN sLg.DetectDate IS NULL
THEN 1
ELSE CASE WHEN sLg.GLocation = s1.GLocation
THEN 0
ELSE 1
END
END AS BinInc
FROM s as s1
LEFT OUTER JOIN s AS sLg ON
s1.AnimalID= sLg.AnimalID AND
s1.DetID = sLg.DetID + 1
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'filled first temp table' as reason,
COUNT(*) as SizeOfFirstTemp FROM ##ttOrder_det; --2015-12-16 18:43:03, 46627879
GO
CREATE NONCLUSTERED INDEX NIX_F on ##ttOrder_det (AnimalID ASC);
CREATE NONCLUSTERED INDEX NIX_R ON ##ttOrder_det (DetID ASC);
--dropped several additional indices: `NIX_`s VTC, TCD, A, DD, Bi
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'indexed first temp table' as reason; --2015-12-16 18:44:12
GO
Quick comments
I switched to using ROW_NUMBER
instead of RANK
. This is non-deterministic, but at least will result in no "ties", which would result in breaking of the LAG
implementation around that point. Ties shouldn't exist, but this is just future-proofing against General_Location
s that are very close together that co-detect the same transmission.
Again, as two users pointed out in the comments above, I'm not using the full PK of the Tagged_Animal
table, so there's a possibility that there be a JOIN
with an ambiguous AnimalID
. However, at present, both AnimalID
and TagSN
are UNIQUE
, albeit unconstrained.
I was planning on dropping the ReceiverID
field in favor of using the Location
, but I had a period in which there were two Receivers deployed
at the same location (one receiver was presumed lost but later found) which did indeed detect the same animal at the same time
Follow-up code to complete the task
CREATE TABLE ##ttOrder_det2 (
AnimalID nvarchar(50) not null,
DetID int NOT NULL,
BinNum int NULL,
CONSTRAINT [PK_ttRDA2] PRIMARY KEY CLUSTERED
([AnimalID] ASC, [DetID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'second temp table created' as reason; --2015-12-16 18:44:15
GO
-- SET XACT_ABORT ON will cause the transaction to be uncommittable
-- when the constraint violation occurs.
BEGIN TRANSACTION
BEGIN TRY
DECLARE @AnimalID as nvarchar(50);
DECLARE @DetID as int;
DECLARE @BinInc as int;
DECLARE @BinNum as int;
DECLARE @AnimalVisit as CURSOR;
SET @AnimalVisit = CURSOR FOR SELECT AnimalID, DetID, BinIncORNum FROM ##ttOrder_det ORDER BY AnimalID, DetID;
OPEN @AnimalVisit;
FETCH NEXT FROM @AnimalVisit INTO @AnimalID, @DetID, @BinInc;
WHILE @@FETCH_STATUS = 0
BEGIN
IF (@DetID <= 1) SET @BinNum = 0;
SET @BinNum += @BinInc;
INSERT INTO ##ttOrder_det2 (AnimalID, DetID, BinNum) VALUES (@AnimalID, @DetID, @BinNum);
FETCH NEXT FROM @AnimalVisit INTO @AnimalID, @DetID, @BinInc;
END
CLOSE @AnimalVisit;
DEALLOCATE @AnimalVisit;
END TRY
BEGIN CATCH
exec sp_lock; -- added to display the open locks after the timeout
-- exec sp_who2; -- shows the active processes
EXECUTE usp_GetErrorInfo;
--RETURN -- ignoring this error for brevity
-- Test XACT_STATE:
-- If 1, the transaction is committable.
-- If -1, the transaction is uncommittable and should
-- be rolled back.
-- XACT_STATE = 0 means that there is no transaction and
-- a commit or rollback operation would generate an error.
-- Test whether the transaction is uncommittable.
IF (XACT_STATE()) = -1
BEGIN
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'The transaction is in an uncommittable state. Rolling back transaction.' as reason
SET XACT_ABORT off
RETURN -- 1 --error
ROLLBACK TRANSACTION;
END;
-- Test whether the transaction is committable.
IF (XACT_STATE()) = 1
BEGIN
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'The transaction is committable. Committing transaction.' as reason
COMMIT TRANSACTION;
END;
END CATCH;
IF @@TRANCOUNT > 0
COMMIT TRANSACTION;
GO
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'filled second temp table' as reason,
COUNT(*) as Table2Size from ##ttOrder_det2; --2015-12-16 19:11:17, 46627879
GO
CREATE NONCLUSTERED INDEX NIX_CT2 ON ##ttOrder_det2 (AnimalID ASC);
CREATE NONCLUSTERED INDEX NIX_R2 ON ##ttOrder_det2 (DetID ASC);
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'indexed second temp table' as reason;
--EXEC tempdb.dbo.usp_PruneTemp;
GO
BEGIN TRANSACTION
BEGIN TRY
UPDATE a
SET a.BinIncORNum=b.BinNum
FROM ##ttOrder_det AS a
INNER JOIN ##ttOrder_det2 AS b ON
a.AnimalID=b.AnimalID AND a.DetID=b.DetID;
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'backfilled first temp table with info from second table' as reason,
@@ROWCOUNT as EntriesAffected; --2015-12-16 19:19:54, 46627879
DROP TABLE ##ttOrder_det2;
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SELECT name, log_reuse_wait_desc FROM sys.databases
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'ERROR: backfilling first temp table, see sys.databases info' as reason;
EXECUTE usp_GetErrorInfo;
IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
SET XACT_ABORT off
RETURN --1 --error
END CATCH
--IF @@TRANCOUNT > 0 COMMIT TRANSACTION
GO
CREATE TABLE derived__AnimalVisits (
AnimalID nvarchar(50) not null,
DetectDate DateTime not null,
ReceiverID int NOT NULL,
Location nvarchar(255) NOT NULL,
GeneralLocation nvarchar(255) NULL,
DetOrder int NOT NULL,
VisitNum int NOT NULL,
CONSTRAINT [PK_dFV] PRIMARY KEY CLUSTERED
(AnimalID ASC, DetectDate ASC, ReceiverID ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE UNIQUE INDEX UX_CTR ON derived__AnimalVisits (AnimalID ASC, DetOrder ASC);
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'created persistent derived table' as reason; --2015-12-16 19:20:53
GO
--DECLARE @FVError as int = 0;
BEGIN TRY
INSERT INTO derived__AnimalVisits (AnimalID, DetectDate, ReceiverID, Location, GeneralLocation, DetOrder, VisitNum)
SELECT AnimalID, DD, ReceiverID, Location, GLocation, COALESCE(DetID,0), COALESCE(BinIncORNum,0) FROM ##ttOrder_det;
END TRY
BEGIN CATCH
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'ERROR: filling persistent derived table' as reason;
EXECUTE usp_GetErrorInfo;
SET XACT_ABORT off
RETURN --1 --error
END CATCH
CREATE NONCLUSTERED INDEX NIX_CT ON derived__AnimalVisits (AnimalID ASC);
CREATE NONCLUSTERED INDEX NIX_VTC ON derived__AnimalVisits (ReceiverID ASC, AnimalID ASC);
CREATE NONCLUSTERED INDEX NIX_TCD ON derived__AnimalVisits (AnimalID ASC, DetectDate ASC);
CREATE NONCLUSTERED INDEX NIX_R ON derived__AnimalVisits (DetOrder ASC);
CREATE NONCLUSTERED INDEX NIX_Bi ON derived__AnimalVisits (VisitNum ASC);
SELECT
cast(cast(current_timestamp as datetime2) as varchar(19)) as time,
'indexed / now cleaning up' as reason; --2015-12-16 19:31:18
GO
IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = OBJECT_ID(N'[tempdb].[dbo].[##ttOrder_det2]') AND xtype=N'U')
DROP TABLE tempdb.[dbo].[##ttOrder_det2];
IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = OBJECT_ID(N'[tempdb].[dbo].[##ttOrder_det]') AND xtype=N'U')
DROP TABLE tempdb.[dbo].[##ttOrder_det];
SET XACT_ABORT off
--cleaning up of transaction logs, etc done at 2015-12-16 19:39:07
Best Answer
The best numbers for IO you can get is to turn on statistics.
and then run your query via management studio. The set option will work for your current connection, and the information will appear in the messages tab.
Then you'll get information on how many pages will be read based on your query.
Also for ease of parsing complex/large query statistics, I use this handy little webpage: http://www.statisticsparser.com/