SQL Server – Do Multiple OLE DB Connections Require Table Lock?

sql serversql-server-2008-r2ssis

SSIS is not really my forte. I noticed you have a table lock option on OLE DB destinations where the entire table becomes locked in what I assume is during the transaction of inserting data.

What happens if you have a flat file data source with a conditional split that is parsing out data into 5 or more OLE DB destinations that are targeting the same table with table lock on? Would each OLE DB destination get blocked by one another in this scenario if data is being fast loaded (inserted) into the destinations themselves?

I removed the table lock in my instance and everything seemed fine. It was splitting the data and inserting records at about 1 million records per minute.

Best Answer

The table lock option facilitates minimal logging, reducing transaction log space requirements under certain circumstances. Below are relevant excerpts from the Prerequisites for Minimal Logging topic in the SQL Server Books Online.

The table is not being replicated.

Table locking is specified (using TABLOCK).

Table is not a memory-optimized table.

If the table has no indexes, data pages are minimally logged.

If the table has no clustered index but has one or more nonclustered indexes, data pages are always minimally logged. How index pages are logged, however, depends on whether the table is empty:

If the table is empty, index pages are minimally logged.

If table is non-empty, index pages are fully logged.

If you start with an empty table and bulk import the data in multiple batches, both index and data pages are minimally logged for the first batch, but beginning with the second batch, only data pages are minimally logged.

If the table has a clustered index and is empty, both data and index pages are minimally logged. In contrast, if a table has a clustered index and is non-empty, data pages and index pages are both fully logged regardless of the recovery model.

The table lock will block other processes using the table, including the other OLEDB Destinations within the SSIS package. You could set the Maximum Insert Commit Size to improve concurrency but with the consideration that file will be partially imported in the event of a failure.

Related Solutions

Sql-server – Duplicating some rows of a data flow in SSIS

I see no glaring issue with that data flow. I always suggest to do as much work as possible in your source queries, so if you can create a dataset at the outset that allows you to populate both tables via a simple split, it would certainly use less memory. But things like that are not always possible depending on the data source and data format.

Also, the trash destination is good for development/debugging but does no good in production. Remove it. Let the 'Unwanted Data'expire at the split. SSIS can figure out the rest.

As long as you avoid blocking components (UNION, MERGE, etc.) there isn't any reason this process would take days. I frequently process many millions of rows in SSIS without issue. SSIS is only as slow as the person who designed the process.

Is it currently having performance issues?

SQL Server Locking – Strategies for Checking Out Records for Processing

I am not a huge fan of either the extra "lock" table or the idea of locking the whole table to grab the next record. I get why it is being done, but that also hurts concurrency for operations that are updating to release a locked record (surely two processes can't be fighting over that when it is not possible for two processes to have locked the same record at the same time).

My preference would be to add a ProcessStatusID (typically TINYINT) column to the table with the data being processed. And is there a field for LastModifiedDate? If not, then it should be added. If yes, then do these records get updated outside of this processing? If records can be updated outside of this particular process, then another field should be added to track StatusModifiedDate (or something like that). For the rest of this answer I will just use "StatusModifiedDate" as it is clear in its meaning (and in fact, could be used as the field name even if there is currently no "LastModifiedDate" field).

The values for ProcessStatusID (which should be placed into a new lookup table called "ProcessStatus" and Foreign Keyed to this table) could be:

Completed (or even "Pending" in this case as both mean "ready to be processed")
In Process (or "Processing")
Error (or "WTF?")

At this point it seems safe to assume that from the application, it just wants to grab the next record to process and won't be passing anything in to help make that decision. So we want to grab the oldest (at least in terms of StatusModifiedDate) record that is set to "Completed" / "Pending". Something along the lines of:

SELECT TOP 1 pt.RecordID
FROM   ProcessTable pt
WHERE  pt.StatusID = 1
ORDER BY pt.StatusModifiedDate ASC;

We also want to update that record to "In Process" at the same time to prevent the other process from grabbing it. We could use the OUTPUT clause to let us do the UPDATE and SELECT in the same transaction:

UPDATE TOP (1) pt
SET    pt.StatusID = 2,
       pt.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
OUTPUT INSERTED.RecordID
FROM   ProcessTable pt
WHERE  pt.StatusID = 1;

The main problem here is that while we can do a TOP (1) in an UPDATE operation, there is no way to do an ORDER BY. But, we can wrap it in a CTE to combine those two concepts:

;WITH cte AS
(
   SELECT TOP 1 pt.RecordID
   FROM   ProcessTable pt (READPAST, ROWLOCK, UPDLOCK)
   WHERE  pt.StatusID = 1
   ORDER BY pt.StatusModifiedDate ASC;
)
UPDATE cte
SET    cte.StatusID = 2,
       cte.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
OUTPUT INSERTED.RecordID;

The obvious question is whether or not two processes doing the SELECT at the same time can grab the same record. I am pretty sure that the UPDATE with OUTPUT clause, especially combined with the READPAST and UPDLOCK hints (see below for more details), will be fine. However, I have not tested this exact scenario. If for some reason the above query does not take care of the race condition, then adding the following will: application locks.

The CTE query above can be wrapped in sp_getapplock and sp_releaseapplock to create a "gate keeper" for the process. In doing so, only one process at a time will be able to enter in order to run the query above. The other process(es) will be blocked until the process with the applock releases it. And since this step of the overall process is just to grab the RecordID, it is fairly quick and won't be blocking the other process(es) for very long. And, just as with the CTE query, we are not blocking the entire table, thereby allowing other updates to other rows (to set their status to either "Completed" or "Error"). Essentially:

BEGIN TRANSACTION;
EXEC sp_getapplock @Resource = 'GetNextRecordToProcess', @LockMode = 'Exclusive';

   {CTE UPDATE query shown above}

EXEC sp_releaseapplock @Resource = 'GetNextRecordToProcess';
COMMIT TRANSACTION;

Application locks are very nice but should be used sparingly.

Lastly, you just need a stored procedure to handle setting the status to either "Completed" or "Error". And that can be a simple:

CREATE PROCEDURE ProcessTable_SetProcessStatusID
(
   @RecordID INT,
   @ProcessStatusID TINYINT
)
AS
SET NOCOUNT ON;

UPDATE pt
SET    pt.ProcessStatusID = @ProcessStatusID,
       pt.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
FROM   ProcessTable pt
WHERE  pt.RecordID = @RecordID;

Table Hints (found at Hints (Transact-SQL) - Table):

READPAST (seems to fit this exact scenario)

Specifies that the Database Engine not read rows that are locked by other transactions. When READPAST is specified, row-level locks are skipped. That is, the Database Engine skips past the rows instead of blocking the current transaction until the locks are released...READPAST is primarily used to reduce locking contention when implementing a work queue that uses a SQL Server table. A queue reader that uses READPAST skips past queue entries locked by other transactions to the next available queue entry, without having to wait until the other transactions release their locks.
ROWLOCK (just to be safe)

Specifies that row locks are taken when page or table locks are ordinarily taken.
UPDLOCK

Specifies that update locks are to be taken and held until the transaction completes. UPDLOCK takes update locks for read operations only at the row-level or page-level.

Best Answer

Related Solutions

Sql-server – Duplicating some rows of a data flow in SSIS

SQL Server Locking – Strategies for Checking Out Records for Processing

Related Question