Sql-server – SQL Trigger- Updating count

sql serversql server 2014trigger

So I need to create a trigger that creates a new evidence number for each case. For instance, case #1 can have evidence #'s 1,2,3,4 and so on. Case #2 can also have evidence #'s 1,2,3, etc.

So I have a "Case" table (CaseID as PK) and a "Evidence" Table (EvidenceNum as PK and CaseID as FK and associated attributes)

So everytime I search for a specific caseID, I want a new "EvidenceID" column to populate. Such as Evidence item #1, #2, and so on. So these numbers can be repeated for each case. Hence why this is not the primary key. EvidenceNum is primary key but that wont be seen by the end user. Any help with this??

Best Answer

I would implement this using a stored procedure instead of a trigger. Use a separate key table to store the last used evidence number for each case.

I mocked up a minimally viable complete example.

Drop the objects from tempdb if they already exist, so we can modify the code as required.

USE tempdb;

IF OBJECT_ID(N'dbo.AddCase', N'P') IS NOT NULL
DROP PROCEDURE dbo.AddCase;
IF OBJECT_ID(N'dbo.AddEvidence', N'P') IS NOT NULL
DROP PROCEDURE dbo.AddEvidence;
IF OBJECT_ID(N'dbo.EvidenceKeys', N'U') IS NOT NULL
DROP TABLE dbo.EvidenceKeys;
IF OBJECT_ID(N'dbo.Evidence', N'U') IS NOT NULL
DROP TABLE dbo.Evidence;
IF OBJECT_ID(N'dbo.Cases', N'U') IS NOT NULL
DROP TABLE dbo.Cases;
GO

Create a Cases and Evidence table, along with an EvidenceKey table to store the incrementing Evidence Number.

CREATE TABLE dbo.Cases
(
    CaseID int NOT NULL IDENTITY(1,1)
        CONSTRAINT PK_Cases
        PRIMARY KEY CLUSTERED
) ON [PRIMARY];

CREATE TABLE dbo.Evidence
(
    EvidenceID int NOT NULL IDENTITY(1,1)
        CONSTRAINT PK_Evidence
        PRIMARY KEY CLUSTERED
    , CaseID int NOT NULL
        CONSTRAINT FK_Evidence_CaseID
        FOREIGN KEY 
        REFERENCES dbo.Cases(CaseID)
    , EvidenceNum int NOT NULL
    , CONSTRAINT UQ_EvidenceNum
        UNIQUE (CaseID, EvidenceNum)
);

CREATE TABLE dbo.EvidenceKeys
(
    CaseID int NOT NULL
        CONSTRAINT PK_EvidenceKeys
        PRIMARY KEY CLUSTERED
    , MaxEvidenceNum int NOT NULL
);
GO

Create a procedure used to add a new Case. You'd need to add parameters to this such as the Case Name, date, etc.

CREATE PROCEDURE dbo.AddCase
(
    @CaseID int OUTPUT
)
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @Cases TABLE
    (
        CaseID int NOT NULL
    );
    INSERT INTO dbo.Cases 
    OUTPUT inserted.CaseID 
    INTO @Cases (CaseID)
    DEFAULT VALUES;
    SELECT @CaseID = CaseID
    FROM @Cases;
END
GO

Create a procedure to add Evidence. Again, this is only a proof-of-concept, so you'd need to add parameters to deal with the actual evidence item details.

CREATE PROCEDURE dbo.AddEvidence
(
    @CaseID int
    , @EvidenceID int OUTPUT
)
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @MaxEvidences TABLE
    (
        MaxEvidenceNum int NOT NULL
    );
    SET @EvidenceID = NULL;
    UPDATE dbo.EvidenceKeys
    SET MaxEvidenceNum += 1
    OUTPUT inserted.MaxEvidenceNum
    INTO @MaxEvidences(MaxEvidenceNum)
    WHERE dbo.EvidenceKeys.CaseID = @CaseID;
    SELECT @EvidenceID = MaxEvidenceNum
    FROM @MaxEvidences;
    IF @EvidenceID IS NULL
    BEGIN
        INSERT INTO dbo.EvidenceKeys (CaseID, MaxEvidenceNum)
        VALUES (@CaseID, 1);
        SET @EvidenceID = 1;
    END
    INSERT INTO dbo.Evidence (CaseID, EvidenceNum)
    VALUES (@CaseID, @EvidenceID);
END;
GO

Insert some sample data:

DECLARE @CaseID int;
DECLARE @EvidenceID int;

EXEC dbo.AddCase @CaseID OUT;
EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
SELECT @EvidenceID;
EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
SELECT @EvidenceID;
EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
SELECT @EvidenceID;

EXEC dbo.AddCase @CaseID OUT;
EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
SELECT @EvidenceID;
EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
SELECT @EvidenceID;
EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
SELECT @EvidenceID;

Each execution of dbo.AddEvidence will increment the value in the dbo.EvidenceKeys table for the given @CaseID in a single atomic operation, reducing the chance for locking to become a problem.

SELECT *
FROM dbo.Cases c
    INNER JOIN dbo.Evidence e ON c.CaseID = e.CaseID

Results from the select above:

╔════════╦════════════╦════════╦═════════════╗
║ CaseID ║ EvidenceID ║ CaseID ║ EvidenceNum ║
╠════════╬════════════╬════════╬═════════════╣
║      1 ║          1 ║      1 ║           1 ║
║      1 ║          2 ║      1 ║           2 ║
║      1 ║          3 ║      1 ║           3 ║
║      2 ║          4 ║      2 ║           1 ║
║      2 ║          5 ║      2 ║           2 ║
║      2 ║          6 ║      2 ║           3 ║
╚════════╩════════════╩════════╩═════════════╝

Since obtaining the maximum EvidenceKey value for any given CaseID, and updating the dbo.EvidenceKeys table, occurs in a single atomic statement, the opportunity for deadlocks is vastly reduced, without the need for locking hints.

To test this design, I ran the following code. The first piece creates 100 "cases", each with 3 rows of "Evidence". Then, in 3 separate sessions, the 2nd piece of code inserts 100,000 rows into the Evidence table, randomly assigning each evidence row to a randomly chosen case. No deadlocks occurred, and the process took under 1 minute on my old, slow, dev workstation.

DECLARE @loop int = 0;
DECLARE @CaseID int;
DECLARE @EvidenceID int;
WHILE @loop < 100
BEGIN
    EXEC dbo.AddCase @CaseID OUT;
    EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
    EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
    EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
    SET @loop += 1;
END
GO

This piece should run in 3 (or more) separate sessions:

DECLARE @loop int = 0;
DECLARE @CaseID int;
DECLARE @EvidenceID int;
WHILE @loop < 100000
BEGIN
    SET @CaseID = (SELECT TOP (1) CaseID FROM dbo.Cases ORDER BY CRYPT_GEN_RANDOM(10));
    EXEC dbo.AddEvidence @CaseID, @EvidenceID OUT;
    SET @loop += 1;
END

Query Plan Analysis

The query you have now is:

UPDATE P
SET HHID = H.HHID
FROM dbo.households AS H
JOIN dbo.persons AS P
    ON P.tempId = H.tempId
    AND P.n = H.n;

This generates the rather inefficient plan:

Default plan

The main problems in this plan are the hash join and sort. Both require a memory grant (the hash join needs to build a hash table, and the sort needs room to store the rows while sorting progresses). Plan Explorer shows this query was granted 765 MB:

Memory Grant

This is quite a lot of server memory to dedicate to one query! More to the point, this memory grant is fixed before execution starts based on row count and size estimates.

If the memory turns out to be insufficient at execution time, at least some data for the hash and/or sort will be written to physical tempdb disk. This is known as a 'spill' and it can be a very slow operation. You can trace these spills (in SQL Server 2008) using the Profiler events Hash Warnings and Sort Warnings.

The estimate for the hash table's build input is very good:

Hash Input

The estimate for the sort input is less accurate:

Sort Input

You would have to use Profiler to check, but I suspect the sort will spill to tempdb in this case. It is also possible that the hash table spills too, but that is less clear-cut.

Note that the memory reserved for this query is split between the hash table and sort, because they run concurrently. The Memory Fractions plan property shows the relative amount of the memory grant expected to be used by each operation.

Why Sort and Hash?

The sort is introduced by the query optimizer to ensure that rows arrive at the Clustered Index Update operator in clustered key order. This promotes sequential access to the table, which is often much more efficient than random access.

The hash join is a less obvious choice, because it's inputs are similar sizes (to a first approximation, anyway). Hash join is best where one input (the one that builds the hash table) is relatively small.

In this case, the optimizer's costing model determines that hash join is the cheaper of the three options (hash, merge, nested loops).

Improving Performance

The cost model does not always get it right. It tends to over-estimate the cost of parallel merge join, especially as the number of threads increases. We can force a merge join with a query hint:

UPDATE P
SET HHID = H.HHID
FROM dbo.households AS H
JOIN dbo.persons AS P
    ON P.tempId = H.tempId
    AND P.n = H.n
OPTION (MERGE JOIN);

This produces a plan that does not require as much memory (because merge join does not need a hash table):

Merge Plan

The problematic sort is still there, because merge join only preserves the order of its join keys (tempId, n) but the clustered keys are (tempId, n, sporder). You may find the merge join plan performs no better than the hash join plan.

Nested Loops Join

We can also try a nested loops join:

UPDATE P
SET HHID = H.HHID
FROM dbo.households AS H
JOIN dbo.persons AS P
    ON P.tempId = H.tempId
    AND P.n = H.n
OPTION (LOOP JOIN);

The plan for this query is:

Serial Nested Loops Plan

This query plan is considered the worst by the optimizer's costing model, but it does have some very desirable features. First, nested loops join does not require a memory grant. Second, it can preserve the key order from the Persons table so that an explicit sort is not needed. You may find this plan performs relatively well, perhaps even good enough.

Parallel Nested Loops

The big drawback with the nested loops plan is that it runs on a single thread. It is likely this query benefits from parallelism, but the optimizer decides there is no advantage in doing that here. This is not necessarily correct either. Unfortunately, there is no built-in query hint to get a parallel plan, but there is an undocumented way:

UPDATE t1
  SET t1.HHID = t2.HHID
  FROM dbo.persons AS t1
  INNER JOIN dbo.households AS t2
  ON t1.tempId = t2.tempId AND t1.n = t2.n
OPTION (LOOP JOIN, QUERYTRACEON 8649);

Enabling trace flag 8649 with the QUERYTRACEON hint produces this plan:

Parallel Nested Loops Plan

Now we have a plan that avoids the sort, requires no extra memory for the join, and uses parallelism effectively. You should find this query performs much better than the alternatives.

More information on parallelism in my article Forcing a Parallel Query Execution Plan:

Best Answer

Related Solutions

Sql-server – Adding Row Version to SQL Server table

How to Efficiently Update a Table Using JOIN in SQL Server

Query Plan Analysis

Why Sort and Hash?

Improving Performance

Nested Loops Join

Parallel Nested Loops

Related Question