SQL Master Data Services – How to Purge Old Records from Huge Leaf Member Staging Table

master-data-servicessql serversql-server-2012

One of our SQL instances has a large MDS database, and virtually 100% of the size is due to a 300+ million row table named stg.Stg_Project_Leaf. It recently started blowing up our weekly index rebuilds, so I need to figure this out.

From what I can tell, this is a Leaf Member Staging Table for a "Project" entity, set up and automated (by a prior consultant) as a part of an ETL process that builds BI cubes/reports.

Looks like it has been processing about 10k rows per half hour for the past several years and has never been purged. When I check the ImportStatus_ID, though, I see 0 rows waiting to be processed. (Most are 1 – succeeded, less than 1% are 2 – failed.)

Can I just TRUNCATE this table?

This system is being used in an automated fashion, its not like people are logging into the MDS console and individually monitoring and rolling back changes. I found some references to stored procedures that purge these tables (mdm.udpStagingClear or maybe mdm.udpDeletedMembersPurge), but I'm not sure which of those is the one I'd need, nor do I know if they would blow up the log if they tried to do 300m rows at once.

Best Answer

In short: The official answer is to use the new proc mdm.udpEntityStagingBatchTableCleanup to clean up entity staging tables, but in practice, DELETE or TRUNCATE should work just fine (or might be necessary), as long as you understand your batch detail.


I found reference to a SQL 2012 patch that adds 3 stored procedures to MDS, including the one I need:

--Cleanup entity-based staging table
EXEC mdm.udpEntityStagingBatchTableCleanup @ModelID, @CleanupOlderThanDate;

See also this article for scheduled cleanup options utilizing the same procs.

For my purpose, though, I was still concerned about blowing up the tran log with the 300+ million rows that had to be deleted, so I looked in detail at the inner workings of mdm.udpEntityStagingBatchTableCleanup. This stored proc does the following:

  1. Populates a temp table with all completed batches that match your deletion criteria (for the given @Model_ID, older than @CleanupOlderThanDate).
  2. Builds dynamic DELETE statements for the corresponding Leaf, Consolidated, and Relationship tables for each DISTINCT Entity_ID in the temp table.
  3. It executes each of those DELETE statements for each entity table, filtered via a join to the temp table.
  4. It then deletes from mdm.tblStgBatch itself, again filtered by join to the temp table.

There is no attempt to limit the amount of rows deleted at once, and in my case, the 300+ million rows are all for a single Entity_ID, which would likely blow up the transaction log.

The good news, though, is that if I intend to remove all Entity_IDs and all dates anyway (which I do), then that's functionally equivalent to a TRUNCATE, as long as I am careful to not attempt it right in the middle of an active batch.

Finally, while I couldn't find any official MS sources that explicitly recommended truncating these tables, I did find several articles where DBAs do a routine TRUNCATE at either the beginning or the end of their import process:

So my plan is to do an initial TRUNCATE, and then schedule mdm.udpEntityStagingBatchTableCleanup to run on a periodic basis.