Azure SQL Database – How to Fix High DTU Usage on Specific Query

azure-sql-databaseperformanceperformance-tuningquery-performance

I have a query that runs every few seconds on a Microsoft SQL Azure Database. It is from a web job that checks if there are actions that need to be done based on rows in a specific table (its a queueing table). Based on the image attached, does anyone have any suggestion of how i could reduce the DTU usage for the query in question, it is causing the database to reach 100% CPU.

enter image description here

Best Answer

Not having the full table definition, I'm going to take a stab at what it might be. Based on the screenshot in your question, the table could look something like this:

CREATE TABLE dbo.TaskSchedulerItem (
    TaskSchedulerItemID int IDENTITY(1,1),
    TaskSchedulerID int,
    ItemStatus varchar(10),
    DateLastUpdated datetime2(0),
    CONSTRAINT PK_TaskSchedulerItem PRIMARY KEY CLUSTERED (TaskSchedulerItemID)
);

And just for funsies, lets throw some random sample data in there:

INSERT INTO dbo.TaskSchedulerItem (ItemStatus, TaskSchedulerID, DateLastUpdated)
SELECT TOP 10 'Pending', column_id, GETDATE()
FROM sys.columns
UNION ALL
SELECT TOP 10000 'Complete', column_id, GETDATE()-1
FROM sys.columns;

And now we can run our query against that data:

DECLARE @CurrentDateTime datetime2(0)   = GETDATE(),
        @TaskSchedulerID int            = 4,
        @TaskSchedulerItemID int;

SELECT TOP 1 @TaskSchedulerItemID = TaskSchedulerItemID
FROM dbo.TaskSchedulerItem
WHERE (
    ItemStatus IN ('Pending','Failed')
    OR
    ItemStatus = 'IN PROGRESS' AND DateLastUpdated < DATEADD(MINUTE, -1, @CurrentDateTime)
)
AND TaskSchedulerID = @TaskSchedulerID
ORDER BY DateLastUpdated;

SELECT @TaskSchedulerItemID;

And we can see that this query does a Clustered Index Scan: Clustered Index Scan

I don't know for sure that your query is doing a Clustered Index scan, but that is going to be my guess. We're only looking for one row--so ideally, we would be able to do a seek to that one row (or, at least fewer rows). The ORed criteria makes that a little bit trickier, so having a single index that always gets us the exact row is a bit challenging. Instead, we can just try to get close.

We can create an index on TaskSchedulerID (we always filter on that!), and DateLastUpdated (we always sort on that!), and ItemStatus (we filter on that, but in the OR, so it's a little trickier for SQL to search on that):

CREATE INDEX ix_TaskSchedulerId_ItemStatus 
    ON dbo.TaskSchedulerItem(TaskSchedulerID, DateLastUpdated, ItemStatus);

Huzzah! It's doing a seek!

Index Seek

That should make the query pretty darned fast... but what's that little warning symbol? Seek Warning

That's the "get close" part I was talking about. In my random sampling, the index seek was able to seek to 280 rows, then did some residual I/O to find the one row it needed that fit the critera. Based on the index I created, it is going to seek to the supplied @TaskSchedulerID, then start running along the DateLastUpdated column (remember, that's the ORDER BY on the query, then do "residual I/O" to figure out that ItemStatus criteria.

Depending on your data, that might be "good enough" to fix your performance issues. Or you might need to be a bit more clever. Since I don't know your actual schema, indexing, or data distribution, I can't hazard a better guess.

But really, you just need an index that will make your query go a lot faster, so that you use fewer DTUs.