Sql-server – Select statements during long update

sql serversql-server-2008update

I've a table

MESSAGES 
id (long) | attach_data (image) | ins_date | flag

this table contains all messages of my application. During night, my job cleans old attach_data with query

update messages set attach_data=0x 
where ins_date< DATEADD(DAY, -120, GETDATE())  
and datalength(image)> 0 

this is very long query (about 2 hours)

Other processes are always querying same table with

select id FROM MESSAGES WHERE Flag = 0 ORDER BY ins_date ASC

But during this 2 night hours, this query is not function (timeout expired).
There are a tricks?
I'm running Sql server 2005

Best Answer

[Note: I got to the end of writing this and noticed that sometimes you refer to the data field as image and sometimes as attach_data. You'll need to take that into account as you read this.]

You might try breaking up the update statement into multiple batches with TOP:

UPDATE TOP (5) messages SET attach_data = 0x 
WHERE ins_date < DATEADD(DAY, -120, GETDATE())  
AND DATALENGTH(image) > 0;

Then you'll need to modify your maintenance window to execute this query repeatedly until no records are returned. You also might want to use a static date parameter specified by the application since the dates will create a moving target as you re-run this query.

Note that this method will cause the maintenance process to take longer to process overall, but each batch should complete more quickly and therefore you shouldn't be blocking reads as long. You can make the whole process faster by increasing the size of the TOP, but each transaction will lock longer. It's a tradeoff you'll have to play with.

Note also that you can't easily specify an order the records will be updated here since UPDATE doesn't support ORDER BY; you'll just have to keep repeating it until it finishes.


If that doesn't work for you, then there are some optimizations you could make. I would guess that part of the issue is that DATALENGTH() is not a very fast function since it has to read the data. Do you really need to test data length here? Is what you're doing significantly different than this:

UPDATE messages SET attach_data = 0x 
WHERE ins_date < DATEADD(DAY, -120, GETDATE())  
AND attach_data <> 0x;

If you must use DATALENGTH() or the above doesn't work, its worth noticing that it is deterministic. That means you can index it. It may be worthwhile to create an indexed view, or a computed column in the table which has DATALENGTH() that you can then index.

Something like this:

CREATE VIEW messages_maintenance
WITH SCHEMABINDING
AS
SELECT id,
    ins_date,
    DATALENGTH(image) AS "image_datalength"
FROM messages;

Followed by a unique clustered index on id, and then a nonclustered index on image_datalength, ins_date. Then you can run:

UPDATE messages SET attach_data = 0x
WHERE EXISTS (SELECT 1 FROM messages_maintenance
        WHERE messages_maintenance.id = messages.id
        AND messages_maintenance.image_datalength > 0
        AND messages_maintenance.ins_date < DATEADD(DAY, -120, GETDATE()));

Adding a computed column and an index on that would be similar.

Note that this method does add a number of indexes that now have to be maintained by the DB, so it isn't free.