I'd like some input on an issue I'm having. We have a section of code that we repeat throughout our stored procedures, and every time it takes quite some time to process, and when combined the number of reads runs into the hundreds of millions on a set of hundreds of thousands of Items. Basically we have Items, and Items can have up to 12 machines, each with their own state.
These are the (simplified) table structures:
CREATE TABLE dbo.ItemMachineState
(
[itemID] [int],
[machineID] [int],
[stateID] [int]
)
CREATE TABLE dbo.Transition
(
[machineID] [int] NOT NULL,
[eventID] [int] NOT NULL,
[stateID] [int] NOT NULL,
[nextStateID] [int] NOT NULL
)
What happens is, during processing we create a #temp table which we work against, and it eventually has an eventID per Item. That temp table is then joined back to ItemState and Transition, like so:
UPDATE dbo.ItemState
SET stateID = tr.nextStateID
FROM #temp t
JOIN dbo.ItemMachineState ist ON ist.itemID = t.itemID
JOIN Transition tr ON tr.stateID = ist.stateID AND
tr.machineID = ist.machineID AND
tr.eventID = t.eventID
So the eventID, which we calculate, determines what will happen to a given Item's Machines, depending on what State they're each in. The issue is that an event can manipulate zero or more machine states in one movement, if that event is relevant to that particular combo of state and machine.
Here's an example of one of these state shifts:
ItemID 3468489 first looks like this in ItemMachineState…
itemID machineID stateID
----------- ----------- -----------
3468489 12 4
3468489 14 113
3468489 15 157
3468489 16 165
3468489 18 169
3468489 19 165
3468489 20 157
3468489 21 165
3468489 23 173
3468489 24 173
3468489 26 9
3468489 36 9
We do some work, and eventually have a #temp table that has an ItemID and an EventID…
itemID eventID
----------- -----------
3468489 64
Then we join both of these tables to Transition, which looks like this for this particular eventID:
machineID eventID stateID nextStateID
----------- ----------- ----------- -----------
13 64 73 79
13 64 74 79
13 64 75 79
13 64 76 79
13 64 77 79
13 64 78 79
13 64 187 79
13 64 188 79
13 64 189 79
13 64 190 79
13 64 191 79
36 64 9 79
36 64 194 79
36 64 196 79
36 64 208 79
36 64 210 79
36 64 213 79
36 64 218 79
46 64 73 79
47 64 73 79
70 64 73 79
70 64 75 79
70 64 76 79
70 64 77 79
70 64 78 79
Putting it all together:
SELECT t.itemID, t.eventID, ist.machineID, ist.stateID, tr.nextStateID
FROM #temp t
JOIN dbo.ItemMachineState ist ON ist.itemID = t.itemID
JOIN Transition tr ON tr.stateID = ist.stateID AND
tr.machineID = ist.machineID AND
tr.eventID = t.eventID
itemID eventID machineID stateID nextStateID
----------- ----------- ----------- ----------- -----------
3468489 64 36 9 79
So in this particular example, this Event was only relevant to one Machine for this Item. It's stateID is updated from 9 to 79 on machineID 36, and everything else stays the same for this Item.
I'd like suggestions on how to approach this differently. We can't move away from the table structure, but we can alter how we go about setting stateID to nextStateID during transitions/events. As you can see above this works by elimination; we need the current state and machine to figure out what the next state is, for that machine, for that event. In some cases this won't update anything, other times it will update multiple machines in one go, and we like that ability. I don't think the leanest solution to this problem will be found by simply changing indexes or adding query hints, and that we need a new approach that limits the number of reads and processing time, but gives us the same functionality.
I wanted to avoid bringing indexes and such into this discussion because I'd have to then use real examples, which pollute the essence of what I'm trying to ask here, I changed the name of columns and tables to simplify my question. In any case, here you go:
Query plan
http://pastebin.com/xhPa4t8d, create and index scripts
http://pastebin.com/sp70QuEJ
Note that in the query plan we force an INNER LOOP JOIN. If left to a simple JOIN, the query takes exponentially longer to process.
Using @wBob UNIQUE CLUSTERED index, before:
Using OPTION (MERGE JOIN, HASH JOIN)
resulted in this execution plan and results:
Will update with other information shortly
Best Answer
I would consider not updating all the rows at once and instead running through batches of machines so the volume of records decrease per update. You can keep the same code, just batch it.