SQL Server – Managing Bitemporal Tables with Changing Schema

database-designsql serversql-server-2008

I have to design a database covering business assumptions using MS SQL-Server 2008 R2. The current data of the tables will be exported to text files and used as input for a certain application. I would like to implement the tables as bitemporal tables as described by R. Snodgrass in order to track valid and transaction times.

However, it is possible and quite likely that the logical schema of the tables could change, i.e. new columns could be added or old columns could be removed.
How would I reflect this in the design of the tables?

Obviously, I couldn't really delete a column, as all old data would disappear. Should I 'flag' not available columns in rows with a NULL entry? Likewise, how would I keep track of columns which are currently in use? This would be crucial for the export of the data as only 'active' columns should be exported.

And last, but not least, how could I log that a new column was added or an old one removed?

Best Answer

Im not sure if this answers all of your questions but this sounds very similar to what I do with our data warehouse. However the way i interpret your post you would want to have 1 table with your data, and another with your table definition meta data.

For the data table add a column to flag is_current. (Either an int or bit datatype) Also have 2 date columns valid_dt_from & valid_dt_to.

If a row changes you update the old row is_current = 0, and valid_dt_to =getdate ().

Insert the new row with is_current =1 and valid_dt_from = getdate ().

You can do the above with a single merge query.

To get the current data select * from table where is_current = 1.

Add new columns as they are needed. But don't delete old columns. In theory if you have to reconstruct the table to previous point in time all rows from that time should have null values at that time.

Create a similar table to define your data table definition. Add a record to show when each field gets added, removed and if it is current.

Your select query could be built dynamically. Where is current =1, or reconstructed as at a point in time.