SQL Server – Excessive Memory Grants from Indexed Foreign Key Cascade Delete

memory-grantsql serversql server 2014t-sql

Each day, stores can enter sales information into our OLTP application. The app calls a stored procedure in SQL to save this information. Based on the users's activity, the application sends a code indicating whether the proc should perform an insert, update, or delete.

This save stored procedure is receiving memory grants of 60 GB for deletes to one row. To duplicate the issue, I ran an ad-hoc delete query between a begin tran and rollback and captured the actual plan below:

https://www.brentozar.com/pastetheplan/?id=r189liBI4

The schema is like follows:

Daily_Item_Sales_Headers -- ~100 million rows on this system
========================
DlyItmSlsHdr_Key decimal(15,0) primary key nonclustered,
DlyItmSlsHdr_PaperworkBatch_Key decimal(15,0), -- FK to parent batch of data that contains other types of data
UK_DlyItmSlsHdr_PaperworkBatchKey_Key clustered, unique, unique key located on PRIMARY (DlyItmSlsHdr_PaperworkBatch_Key, DlyItmSlsHdr_Key)

Daily_Item_Sales -- ~790 million rows on this system
================
DlyItmSls_Key decimal(15,0) primary key nonclustered,
DlyItmSls_DlyItmSlsHdr_Key decimal(15,0), -- FK to header table, cascade delete,
UK_DlyItmSls_DlyItmSlsHdrKey_Key clustered unique constraint on (DlyItmSls_DlyItmSlsHdr_Key, DlyItmSls_Key)
[columns about sales data]

The query I ran is simple:

delete Daily_Item_Sales_Headers where DlyItmSlsHdr_Key = 1

The plan shows the header deletion is correctly cascading to the child sales rows. The plan also shows an index seek on the clustered index of the child table. However, this clustered index seek to the child table has an estimated 790 million rows. The actual number of rows is ~100. The high estimated rows is causing a memory grant of ~60 GB.

Using dbcc show_statistics on the child table indexes, I was able to see that the statistics were updated last night with a 2% sample size. The histogram shows between 1 and ~33,000 rows estimated per parent key. So the statistics appear to show the estimate should be much lower.

Why is this delete query generating such a large memory grant?

I saw this question about excessive sort memory grants that appear to be caused by a bug, but it looks different to me, because there are no sorts in this plan. Maybe it is the same bug applying to the table spools when cascading to the child table?

Excessive sort memory grant

Because of the foreign key cascading, I do not think I can work around the memory grant by deleting the child rows first before deleting the parent rows. This is an OLTP system with up to 10,000 stores working at once, so I cannot drop the foreign keys on demand for a single delete.

EDIT 2/28/2019 1:13 PM CST

The SQL instance has about 400 GB memory allocated to it.

The application has the following trace flags enabled:

  • 1222: deadlock tracing
  • 4199: query processor fixes
  • 2312: use 2014 cardinality estimator
  • 2453: @Table variable cardinality

Disabling trace flags yields different estimates for the child table cascade delete:

Trace Flag 2312    Trace Flag 4199   Row Estimate
===============    ===============   =============
      on                 on            790 million rows
      on                 off           608 rows (very accurate)
      off                on            1 row
      off                off           1 row

Adding a querytraceon for the flag 9130 mentioned in the linked question makes no difference.

A coworker found this interesting article about a memory bug in SQL 2014. The linked resolution was to add option (MAX_GRANT_PERCENT = 1) to the query.

https://www.theregister.co.uk/2016/02/09/microsoft_sql_server_2014_bug/

EDIT

The exact SQL Server version is SQL Server 2014 SP2 CU 12.

The database is compatibility level 110 – SQL Server 2012. We are unable to change the compatibility level for a while.

Best Answer

I believe you're encountering the same bug in the new Cardinality Estimator that I did. I wrote about the problem here, with a more-detailed look here.

In short, the new Cardinality Estimator has a bug when estimating rows to be deleted on a cascading delete. If the driving value on the parent table is far enough outside its histogram, the CE will assume that every single row of the child table gets deleted.

Lucky you though, my server took the high estimates and decided to start scanning instead :(

Also luckily for you, you're dealing with a proc instead of ORM madness, so there are viable solutions. One would be to add DBCC TRACEOFF( 4199) to your procedure (potentially causing permissions issues). That doesn't future-proof you though, since in 2016+ 4199 status doesn't matter for the CE bug. Another option might be to add OPTION(OPTIMIZE FOR (@1 = <some middle value>)).

Since you're not getting a scanning plan, you could also just strongarm the memory grant with MAX_GRANT_PERCENT.