Sql-server – SQL 2016 SQL Server Assertion: File: , line=951 Failed Assertion

sql serversql-server-2016

I'm currently upgrading our Data warehouse from SQL 2012 to SQL 2016. I have both my old and new DW's running side by side in parallel.

My ETL process (A framework developed in SSIS by a 3rd party) has run successfully for more than 2 years on 2012 but is failing on 2016. So far the databases and ETL process are identical.

Both Servers are Virtual machines running on VMWare.
Old Server is Win 2008 with 24Gb of RAM. SQL 2012 Std. Max mem set to 16Gb.
New Server is Win 2012 with 64Gb of RAM. SQL 2016 dev. Max mem set to 50Gb.
New DW is running v13.0.1601.5 RTM Developer Edition (64-bit).

While running my ETL process the load steps that use a SQL Merge into either a dimension or fact table fail with the following error.

Full text:

DESCRIPTION: SQL Server Assertion: File: , line=951
Failed Assertion = 'IS_OFF (BUF_MINLOGGED, m_buf->bstat) ||
pageModifyType != PageModifyType_Contents || GetPagePtr ()->IsTextPage
()'. This error may be timing-related. If the error persists after
rerunning the statement, use DBCC CHECKDB to check the database for
structural integrity, or restart the server to ensure in-memory data
structures are not corrupted.

As recommended I have run DBCC and no errors were found. I have also restarted SQL. I then restarted the ETL process and got the same error.

My searches for this error show that is was known error in SQL 2008, 2012 & 2014 and fixed in subsequent hotfixes & cumulative updates. so I'm a bit surprised to see it reappear in 2016.

The links I've found say it affects SSIS when trying to do inserts if database is in Simple or Bulk Logged recovery model. (I'm running in Simple recovery model)

A suggested workaround is to change Db recovery model to FULL. I have tried this and it works, but it's not much of a solution for a Data Warehouse.

Has anybody else encountered this with 2016?

Can anyone suggest alternative workarounds?

Updates:

26/7/2016: I applied the Critical Update KB3164398 (v13.0.1708.0) and the problem still exists.

27/7/2016: I have applied Cumulative Update CU1 KB3164674 (v13.0.2149.0).

3/8/2016: Error occurred overnight on our smallest cube. CU1 did not fix the issue. Today I reported the bug on MS Connect And I have also logged a support call with Microsoft.

12/8/2016: MS-Support responded initially, but the responses was "We don't have a fix for that". The Support guy was going to discuss it with his colleagues and get back to me. 8 days later I haven't heard from him.

Although I don't have a 'fix' we did find a workaround that suited us. See my posted answer.

29/9/2016. I applied CU2 last week. On Thursay we accidentally ran an old version of the merge which failed again with the same error. So.. CU2 hasnt fixed it either.

23/1/2017: I Applied 2016 SP1 CU1 and I believe this has resolved the issue. Specifically KB3205964

Best Answer

Looking at the KB you have a couple of options / workarounds:

  1. Switch to FULL recovery model. You say "this isn't much of an option for a warehouse" but it's really just a matter of setting up the transaction log backups on a regular basis eg 15 minutes and then disposing of them. SSIS / Maintenance Plans have stock tasks for doing this. You will lose bulk-logged transactions, but I've never found these have made a great difference in runtimes, just log size. You can even backup log to nul which I won't describe here. If you're not sure what to do, ask you local DBA. Disk space and transaction log backup retention are easier problems to solve than fatal errors. When this problem eventually gets solved you can switch back.
  2. The KB mentions "multiple BULK INSERT statements in a single-distributed transaction". It's not clear from your question how your bulk inserts are set up. Are you using SSIS to run 'Execute SQL' tasks which use the MERGE command? What does 'multiple BULK INSERTs' mean here? Is there a way to convert your approach to single BULK INSERTs, one at a time for example? In SSIS you can set the 'MaxConcurrentExecutables' to 1 temporarily for example, see if that helps. Tie it in to a config variable so you can change it back later. Obviously it will slow things down but you prefer your ETL to finish rather than fail quickly. Doing things in parallel is a good pattern and a real strength of SSIS, but you can only go as fast as your slowest component; say you have 10 dimensions which take a minute and one fact which takes an hour, your ETL finishes in an hour running parallel or 1 hour 10 minutes running serially.
  3. MERGE is nice but does have a few issues. You could consider converting back to INSERT/UPDATE. Also you should be using HOLDLOCK with MERGE according to here. Do you use that hint? Does it make any difference to this issue if you do? We had an issue in an early SQL 2014 build where using MERGE with composable DML (OUTPUT clause) into columnstore caused this kind of assertion - I made them take the columnstore indexes off the dimensions, which they had added without telling me.
  4. What kind of transaction handling are you doing? Sometimes with ETL the position is repeatable by just re-running. Sometimes you need it to fail and rollback. How have you implemented this? Can be altered so it's not "single distributed transaction"?

Good luck.