Sql-server – SQL Server Updated Record on Read-Only Filegroup

filegroupspartitioningsql server

I have a very large database in our data warehouse where we have implemented partitioning to manage maintenance and backups. Records of a certain age are eventually migrated to a read-only file group once a month.

Occasionally our ETL process attempts to update older records that have already been migrated to the archive and we expect these to fail. However, I have at least two recent examples where the record in test is updated even when it appears to be in a partition on the read-only file group in our test environment (querying sys.partition_functions and sys.partition_range_values).

An identical record in production causes the expected failure when attempting to update the record. The two times we have caught this so far the update fails in production but succeeds in test (never the other way around).

Relevant environment facts:

  • SQL Server 2012 SP3 CU3 (build 11.0.6537.0)
  • Test is developer edition, production is enterprise
  • Can provide others as requested: Seriously stumped right now…

UPDATE 2016-08-19

Had new records updated somehow overnight. Confirmed it was on the read-only file group. Found that I can update records that were inserted at the same time (i.e. are also on the same partition on the read-only filegroup). I identified a single record on the same partition and have been able to update the record multiple times. Attempts to update the record that updated overnight results in the expected failure.

UPDATE 2016-08-11

Updates continue to occur during the nightly processing in test on the read-only partition. Attempting to update the same records from the process fails. Attempting to update the same records while logged in as the user that updated it previously failed. I am also unable to duplicate the issue by updating a similar record that has not yet been touched by the nightly process.

UPDATE 2016-08-04

Discovered today that it is not limited to that single table as I discovered another occurrence of the same behavior on a different table using the the same partition scheme.

UPDATE 2016-08-03

Running the script from this MSDN script confirms what I get when using Kendra Little's partition helper views ph.FilegroupDetail and ph.ObjectDetail from this demo. The record in question lives in partition #2 (partition column value for the record in question is is 2015-03-18)

Filegroup     Low Boundary     UpperBoundary
Archive  (RO) NULL             1900-01-01
Archive  (RO) 1900-01-01       2015-04-01
ActiveFG (RW) 2015-04-01       2015-07-01
ActiveFG (RW) 2015-07-01       2015-10-01
ActiveFG (RW) 2015-10-01       2015-01-01
ActiveFG (RW) 2016-01-01       2016-04-01
ActiveFG (RW) 2016-04-01       2016-07-01
ActiveFG (RW) 2016-07-01       2016-10-01
ActiveFG (RW) 2016-10-01       2017-01-01
ActiveFG (RW) 2017-01-01       2115-01-01
ActiveFG (RW) 2115-01-01       NULL

Code to put table on partition (there are no other indexes)

ALTER TABLE [dbo].[TABLE_NAME] ADD  CONSTRAINT [pk_TABLE_NAME] PRIMARY KEY CLUSTERED 
(
    [ETL_VERS_START_DTM] ASC,
    [ACCT_NO] ASC,
    [ACCT_TYPE] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON ps_SmallTablesDate(ETL_VERS_START_DTM)

The update statement that should fail (via Informatica):

UPDATE TABLE_NAME SET ETL_JOB_SEQ_NUM = ?, ETL_IUD_CD = ?, ETL_UPD_DTM = ?, ETL_DEL_DTM = ? WHERE ETL_VERS_START_DTM = ? AND ACCT_NO = ? AND ACCT_TYPE = ?
ETL_VERS_START_DTM (ETL_VERS_START_DTM:Date:): "03/17/2015 23:30:02.140000000"
ETL_JOB_SEQ_NUM (ETL_JOB_SEQ_NUM:Int:): "1173651"
ETL_IUD_CD (ETL_IUD_CD:Char.1:): "D"
ETL_UPD_DTM (ETL_UPD_DTM:Date:): "08/02/2016 02:32:45.000000000"
ETL_DEL_DTM (ETL_DEL_DTM:Date:): "08/02/2016 00:10:03.567000000"
ACCT_NO (ACCT_NO:Char.12:): "1234567890"
ACCT_TYPE (ACCT_TYPE:Char.3:): "OLN"

UPDATE 2017-02-21

So after all this time we've discovered that somehow when the oldest active partition was being merged down into the archive, a section of records were not moved on disk from the active file group to the archive file group. The following query shows that records from partition 2 were mapped to ActiveFG while inspecting the actual partitioning scheme shows that those same records should be sorted into the Archive file group by the partition function.

SELECT  OBJECT_NAME(P.[object_id]) ,
    P.index_id ,
    P.partition_number ,
    F.name ,
    F.filegroup_guid
FROM    sys.allocation_units AU
    JOIN sys.partitions P ON P.partition_id = AU.container_id
    JOIN sys.filegroups F ON F.data_space_id = AU.data_space_id
WHERE   P.partition_number IN ( 1, 2, 3 )
    AND P.[object_id] = OBJECT_ID('TABLE_NAME')
ORDER BY P.partition_number;

I backed out all the partitioning in the actual in use databases and kept a version of the one that was broken to work the Microsoft ticket with. I need to revise the partitioning plan with our DW team but I will admit to being a but gun shy about attempting it again.

Microsoft has been unable to duplicate this behavior and so is done with the ticket at this time. They seem ready to just shrug it off and assume it's not present in 2014/2016? They can't seem to replicate it in their labs despite my ability to have it continue to exist in the database even after I restore it from back up in my system.

Best Answer

I have a confession to make.

Once, when I was young, I built an ETL process started with changing read-only filegroups to read-write, doing its ETL work, and then setting them back to read-only.

So just in case you have a coworker who was diabolical like I was (I was young, I needed the money), you can test by:

  1. Change the name of the read-only filegroup - that way, if someone has hard-coded scripts that alter the filegroup by name, their scripts will fail, and you'll catch the culprit. Or, a little harder:

  2. Use Profiler or Extended Events to track anyone who does an ALTER DATABASE.