Sql-server – Change Data Capture and the __$update_mask binary

change-data-capturesql-server-2008

We're using CDC to capture changes made to a production table. The changed rows are being exported out to a data warehouse (informatica). I know that the __$update_mask column stores what columns were updated in a varbinary form. I also know that I can use a variety of CDC functions to find out from that mask what those columns were.

My question is this. Can anyone define for me the logic behind that mask so that we can identify the columns that were changed over in the warehouse? Since we're processing outside of the server we don't have easy access to those MSSQL CDC functions. I would rather just break down the mask myself in code. Performance of the cdc functions on the SQL end is problematic for this solution.

In short, I'd like to identify changed columns by hand from the __$update_mask field.

Update:

As an alternate sending a human readable list of changed columns over to the warehouse was also accepatable. We found this could be performed with performance far greater than our original approach.

The CLR answer to this question below meets this alternative and includes details of interpreting the mask for future visitors. However the accepted answer using XML PATH is the fastest yet for the same final result.

Best Answer

And the moral of the story is... test, try other things, think big, then small, always assume there is a better way.

As scientifically interesting as my last answer was. I decided to try one other approach. I remembered I could do concat with the XML PATH('') trick. Since I knew how to get the ordinal of each changed column from the captured_column list from the previous answer I thought it would be worth testing if the MS bit function would work better that way for what we needed.

SELECT __$update_mask ,
        ( SELECT    CC.column_name + ','
          FROM      cdc.captured_columns CC
                    INNER JOIN cdc.change_tables CT ON CC.[object_id] = CT.[object_id]
          WHERE     capture_instance = 'dbo_OurTableName'
                    AND sys.fn_cdc_is_bit_set(CC.column_ordinal,
                                              PD.__$update_mask) = 1
        FOR
          XML PATH('')
        ) AS changedcolumns
FROM    cdc.dbo_MyTableName PD

It's way cleaner than (though not as fun as) all that CLR, returns the approach back to native SQL code only. And, drum roll.... returns the same results in less than a second. Since the production data is 100 times bigger every second counts.

I'm leaving the other answer up for scientific purposes - but for now, this is our correct answer.