One of our SQL instances has a large MDS database, and virtually 100% of the size is due to a 300+ million row table named stg.Stg_Project_Leaf
. It recently started blowing up our weekly index rebuilds, so I need to figure this out.
From what I can tell, this is a Leaf Member Staging Table for a "Project" entity, set up and automated (by a prior consultant) as a part of an ETL process that builds BI cubes/reports.
Looks like it has been processing about 10k rows per half hour for the past several years and has never been purged. When I check the ImportStatus_ID
, though, I see 0 rows waiting to be processed. (Most are 1 – succeeded, less than 1% are 2 – failed.)
Can I just TRUNCATE
this table?
This system is being used in an automated fashion, its not like people are logging into the MDS console and individually monitoring and rolling back changes. I found some references to stored procedures that purge these tables (mdm.udpStagingClear
or maybe mdm.udpDeletedMembersPurge
), but I'm not sure which of those is the one I'd need, nor do I know if they would blow up the log if they tried to do 300m rows at once.
Best Answer
In short: The official answer is to use the new proc
mdm.udpEntityStagingBatchTableCleanup
to clean up entity staging tables, but in practice,DELETE
orTRUNCATE
should work just fine (or might be necessary), as long as you understand your batch detail.I found reference to a SQL 2012 patch that adds 3 stored procedures to MDS, including the one I need:
See also this article for scheduled cleanup options utilizing the same procs.
For my purpose, though, I was still concerned about blowing up the tran log with the 300+ million rows that had to be deleted, so I looked in detail at the inner workings of
mdm.udpEntityStagingBatchTableCleanup
. This stored proc does the following:@Model_ID
, older than@CleanupOlderThanDate
).DELETE
statements for the corresponding Leaf, Consolidated, and Relationship tables for eachDISTINCT Entity_ID
in the temp table.DELETE
statements for each entity table, filtered via a join to the temp table.mdm.tblStgBatch
itself, again filtered by join to the temp table.There is no attempt to limit the amount of rows deleted at once, and in my case, the 300+ million rows are all for a single
Entity_ID
, which would likely blow up the transaction log.The good news, though, is that if I intend to remove all
Entity_IDs
and all dates anyway (which I do), then that's functionally equivalent to aTRUNCATE
, as long as I am careful to not attempt it right in the middle of an active batch.Finally, while I couldn't find any official MS sources that explicitly recommended truncating these tables, I did find several articles where DBAs do a routine
TRUNCATE
at either the beginning or the end of their import process:So my plan is to do an initial
TRUNCATE
, and then schedulemdm.udpEntityStagingBatchTableCleanup
to run on a periodic basis.