SQL Server – Implementing a Queue with sp_getapplock

lockingsql serversql-server-2008t-sql

I've been reading a series of posts by Paul White about SQL Server Isolation Levels and came across a phrase:

To emphasise the point, pseudo-constraints written in T-SQL have to
perform correctly no matter what concurrent modifications might be
occurring. An application developer might protect a sensitive
operation like that with a lock statement. The closest thing T-SQL
programmers have to that facility for at-risk stored procedure and
trigger code is the comparatively rarely-used sp_getapplock system
stored procedure. That is not to say it is the only, or even preferred
option, just that it exists and can be the right choice in some
circumstances.

I am using sp_getapplock and this made me wonder if I'm using it correctly, or there is a better way to get the desired effect.

I have a C++ application that processes so called "Building servers" in a loop 24/7. There is a table with the list of these Building Servers (about 200 rows). New rows can be added at any time, but it doesn't happen often. Rows are never deleted, but they can be marked as inactive. Processing a server may take from few seconds to dozens of minutes, each server is different, some are "small", some are "large". Once a server is processed, application has to wait at least 20 minutes before processing it again (the servers should not be polled too often). Application starts 10 threads that perform the processing in parallel, but I must guarantee that no two threads attempt to process the same server at the same time. Two different servers can and should be processed simultaneously, but each server can be processed not more often than once in 20 minutes.

Here is the definition of a table:

CREATE TABLE [dbo].[PortalBuildingServers](
    [InternalIP] [varchar](64) NOT NULL,
    [LastCheckStarted] [datetime] NOT NULL,
    [LastCheckCompleted] [datetime] NOT NULL,
    [IsActiveAndNotDisabled] [bit] NOT NULL,
    [MaxBSMonitoringEventLogItemID] [bigint] NOT NULL,
CONSTRAINT [PK_PortalBuildingServers] PRIMARY KEY CLUSTERED 
(
    [InternalIP] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_LastCheckCompleted] ON [dbo].[PortalBuildingServers]
(
    [LastCheckCompleted] ASC
)
INCLUDE 
(
    [LastCheckStarted],
    [IsActiveAndNotDisabled],
    [MaxBSMonitoringEventLogItemID]
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

The main loop of a worker thread in an application looks like this:

for(;;)
{
    // Choose building server for checking
    std::vector<SBuildingServer> vecBS = GetNextBSToCheck();
    if (vecBS.size() == 1)
    {
        // do the check and don't go to sleep afterwards
        SBuildingServer & bs = vecBS[0];
        DoCheck(bs);
        SetCheckComplete(bs);
    }
    else
    {
        // Sleep for a while
        ...
    }
}

Two functions here GetNextBSToCheck and SetCheckComplete are calling corresponding stored procedures.

GetNextBSToCheck returns 0 or 1 row with details of the server that should be processed next. It is a server that hasn't been processed for a longest time. If this "oldest" server has been processed less than 20 minutes ago, no rows are returned and thread will wait for a minute.

SetCheckComplete sets the time when processing was completed, thus making it possible to choose this server for processing again after 20 minutes.

Finally, the code of stored procedures:

GetNextToCheck:

CREATE PROCEDURE [dbo].[GetNextToCheck]
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRANSACTION;
    BEGIN TRY
        DECLARE @VarInternalIP varchar(64) = NULL;
        DECLARE @VarMaxBSMonitoringEventLogItemID bigint = NULL;

        DECLARE @VarLockResult int;
        EXEC @VarLockResult = sp_getapplock
            @Resource = 'PortalBSChecking_app_lock',
            @LockMode = 'Exclusive',
            @LockOwner = 'Transaction',
            @LockTimeout = 60000,
            @DbPrincipal = 'public';

        IF @VarLockResult >= 0
        BEGIN
            -- Acquired the lock
            -- Find BS that wasn't checked for the longest period
            SELECT TOP 1
                @VarInternalIP = InternalIP
                ,@VarMaxBSMonitoringEventLogItemID = MaxBSMonitoringEventLogItemID
            FROM
                dbo.PortalBuildingServers
            WHERE
                LastCheckStarted <= LastCheckCompleted
                -- this BS is not being checked right now
                AND LastCheckCompleted < DATEADD(minute, -20, GETDATE())
                -- last check was done more than 20 minutes ago
                AND IsActiveAndNotDisabled = 1
            ORDER BY LastCheckCompleted
            ;

            -- Start checking the found BS
            UPDATE dbo.PortalBuildingServers
            SET LastCheckStarted = GETDATE()
            WHERE InternalIP = @VarInternalIP;
            -- There is no need to explicitly verify if we found anything.
            -- If @VarInternalIP is null, no rows will be updated
        END;

        -- Return found BS, 
        -- or no rows if nothing was found, or failed to acquire the lock
        SELECT
            @VarInternalIP AS InternalIP
            ,@VarMaxBSMonitoringEventLogItemID AS MaxBSMonitoringEventLogItemID
        WHERE
            @VarInternalIP IS NOT NULL
            AND @VarMaxBSMonitoringEventLogItemID IS NOT NULL
        ;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        ROLLBACK TRANSACTION;
    END CATCH;

END

SetCheckComplete:

CREATE PROCEDURE [dbo].[SetCheckComplete]
    @ParamInternalIP varchar(64)
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRANSACTION;
    BEGIN TRY

        DECLARE @VarLockResult int;
        EXEC @VarLockResult = sp_getapplock
            @Resource = 'PortalBSChecking_app_lock',
            @LockMode = 'Exclusive',
            @LockOwner = 'Transaction',
            @LockTimeout = 60000,
            @DbPrincipal = 'public';

        IF @VarLockResult >= 0
        BEGIN
            -- Acquired the lock
            -- Completed checking the given BS
            UPDATE dbo.PortalBuildingServers
            SET LastCheckCompleted = GETDATE()
            WHERE InternalIP = @ParamInternalIP;
        END;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        ROLLBACK TRANSACTION;
    END CATCH;

END

As you can see, I use sp_getapplock to guarantee that only one instance of both of these stored procedures are running at any given time. I think I need to use sp_getapplock in both procedures, because the query that chooses the "oldest" server uses the LastCheckCompleted time, which is updated by SetCheckComplete.

I think that this code does guarantee that no two threads attempt to process the same server at the same time, but I'd be grateful if you could point to any issues with this code and the overall approach. So, the first question: Is this approach correct?

Also, I'd like to know if the same effect could be achieved without using sp_getapplock. The second question: Is there a better way?

Best Answer

Is this approach correct?

Yes. It meets all the objectives stated in the question.

A comment in the procedures to explain the strategy and note the related procedure name might be helpful for future maintenance by others.

Is there a better way?

Not to my mind, no.

Taking a single lock is an extremely fast operation, and results in very clear logic. It is not clear to me that taking the lock in the second procedure is redundant, but even if it is, what do you really gain by omitting it? The simplicity and safety of your implementation appeal to me.

The alternatives are much more complex, and may leave you wondering if you've truly covered all cases, or if there might be a change in internal engine details in the future that would break (perhaps subtle and unstated) assumptions.

If you ever need a more traditional queueing implementation, the following reference is very useful:

Using Tables as Queues by Remus Rusanu

Related Solutions

Sql-server – Inserting and updating and selecting at thousands of times per second

Change the updates to use WITH (ROWLOCK). That will reduce the locks from page level to row level. You can try the same on the inserts as well.

SQL Server – How to Determine if Queries Run in Memory or Disk

You can use one of these two queries to see the total logical reads and the total physical reads.

SELECT  DB_NAME(st.dbid) Db,
        OBJECT_NAME(st.objectid, st.dbid) Prc,
        qs.execution_count,
        qs.total_logical_reads,
        qs.total_physical_reads,
        qs.statement_start_offset,
        qs.statement_end_offset,
        st.text
FROM    sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st;

SELECT  DB_NAME(database_id) Db,
        OBJECT_NAME(object_id, database_id) Prc,
        execution_count,
        total_logical_reads,
        total_physical_reads
FROM    sys.dm_exec_procedure_stats ps;

The first one breaks this down by statement, the second one counts in the entire procedure.

Physical reads are reads against the disk, logical reads are against memory. You can use this to figure out which procedures or statements are the most expensive ones in your system and try to tune those.

Keep in mind, that while logical reads are significantly cheaper that physical reads, they are still expensive, so reducing the number of them (for example by adding an appropriate index) can make your queries run a lot faster.

There are a lot of additional columns in the DMVs above that you might find interesting too.

How does an index help to reduce logical reads?

In SQL Server all data is organized in blocks, 8KB in size. These blocks are called "pages".

Every table contains "meta" pages that contain information about the struktur of the table as well as pata pages. If no index exists and you run a query like SELECT * FROM tbl WHERE Id = 7 SQL Server has to go looking for this or these rows in the entire table. So it reads in one page at a time, loops through all rows in each page to determine the rows that fit the WHERE clause. So if the table requires 1,000,000 pages to be stored, this query will take 1,000,000 logical reads to execute.

If you have an index, SQL Server sorts the data logically within the pages and establishes a linked list between the pages. This allows for running queries with an ORDER BY to be executed without an expensive sort operation. But mor importantly that the sorting, SQL Server adds a B+Tree to the table. A B+Tree is a structure comparable to the index in a book, where looking for a specific keyword allows me to directly jump to the page that contains the keyword. The typical book has only one index level whereas a B+Tree can have multiple. Just think of a large book, where the index itself is multiple pages long. In a case like that it makes sense the add an additional index layer that tells us on wich page the index words starting with S are to be found.

B+Trees are optimized to have as few levels as possible while providing the property that any record in the index can be found by reading one page per index level. So assume the above WHERE Id = 7 query when you have an index sorted by Id. Let's say that index has 5 levels. Now, to find all records that match this query I have to read one page per index level (that is 5 pages).This is called an "Index Seek". If there are multiple records that fit the bill I might have to follow the sorted index for a while to retrieve all of them. But lets assume there is only one record.

So, without the index running that query required 1,000,000 reads, with indes it required 5 reads. Even though a logical read is an in-memory operation there is stil a substantial cost - in fact it is the most expensive operation in a trivial query like the one above. So reducing the amount of logical reads needed by a factor of 200,000 will speed up your query by a similar factor.

So, a logical read is not equivalent to a table scan, but a table scan causes a whole lot more logical reads than an index seek.

Best Answer

Related Solutions

Sql-server – Inserting and updating and selecting at thousands of times per second

SQL Server – How to Determine if Queries Run in Memory or Disk

Related Question