SQL Server – Managing Bitemporal Tables with Changing Schema

database-designsql serversql-server-2008

I have to design a database covering business assumptions using MS SQL-Server 2008 R2. The current data of the tables will be exported to text files and used as input for a certain application. I would like to implement the tables as bitemporal tables as described by R. Snodgrass in order to track valid and transaction times.

However, it is possible and quite likely that the logical schema of the tables could change, i.e. new columns could be added or old columns could be removed.
How would I reflect this in the design of the tables?

Obviously, I couldn't really delete a column, as all old data would disappear. Should I 'flag' not available columns in rows with a NULL entry? Likewise, how would I keep track of columns which are currently in use? This would be crucial for the export of the data as only 'active' columns should be exported.

And last, but not least, how could I log that a new column was added or an old one removed?

Best Answer

Im not sure if this answers all of your questions but this sounds very similar to what I do with our data warehouse. However the way i interpret your post you would want to have 1 table with your data, and another with your table definition meta data.

For the data table add a column to flag is_current. (Either an int or bit datatype) Also have 2 date columns valid_dt_from & valid_dt_to.

If a row changes you update the old row is_current = 0, and valid_dt_to =getdate ().

Insert the new row with is_current =1 and valid_dt_from = getdate ().

You can do the above with a single merge query.

To get the current data select * from table where is_current = 1.

Add new columns as they are needed. But don't delete old columns. In theory if you have to reconstruct the table to previous point in time all rows from that time should have null values at that time.

Create a similar table to define your data table definition. Add a record to show when each field gets added, removed and if it is current.

Your select query could be built dynamically. Where is current =1, or reconstructed as at a point in time.

Related Solutions

Optimised Schema for City-to-City Distance Table

Most database management systems have a hard limit on either the number of columns you're allowed to use, the number of bytes in a row, or both. So your single table won't work in the general case, because you'll eventually either end up with either too many columns or too many bytes in a row.

To find out what the limitations are for your platform, Google something like

SQL Server limitations
PostgreSQL limitations
Oracle limitations

All you need to store your data is a single table of distances. A table of cities can be used to validate the values in the table of distances, but it's not essential. (I'd use one, though.)

But this won't work in the general case either, because it's subject to the same limitations as a base table. To see how it works for relatively small data sets, look at this example. And look closely at the CHECK() constraints.

create table distances (
  city_from varchar(25) not null,
  city_to varchar(25) not null,
  check (city_from < city_to),
  distance integer not null check (distance >= 0),
  primary key (city_from, city_to)
);

insert into distances values
('Bangalore', 'Chennai', 100),
('Bangalore', 'Hyderabad', 200),
('Bangalore', 'Mumbai', 300),
('Bangalore', 'Delhi', 400),
('Bangalore', 'Kolkata', 500),
('Chennai', 'Hyderabad', 150),
('Chennai', 'Mumbai', 250),
('Chennai', 'Delhi', 450),
('Chennai', 'Kolkata', 550);

Now you can create a pivot table with a SELECT statement. (SQL Server. Other dbms have different ways of dealing with pivot tables.)

SELECT *
FROM (
  SELECT city_from, city_to, distance
  FROM distances
  union all 
  SELECT city_to, city_from, distance
  FROM distances
) AS t1
PIVOT
(
  max(distance)
  FOR [city_to]
  IN (
    [Bangalore], [Chennai], [Delhi], [Hyderabad], [Kolkata], [Mumbai]
  )
) AS t2
ORDER BY city_from;

Because of the limitations on the number of columns and the number of bytes per row, I think you're better off returning the base table (with or without the UNION ALL that you see in the query above) to the application, and letting the application format it for display. Application code doesn't generally have limits on the number of columns or the number of bytes.

Schema design for storing item combination with order

The design doesn't seem awful, but I think you could solve your update issue by changing how you are assigning a priority number. Instead of using 1, 2, 3, 4, ..., you could use a higher range between the values, such as 10, 20, 30, 40, .... You might find you need an even larger set like 50, 100, 150, 200, ....

By increasing the range, you have some flexibility when a user changes the ordering of a given item. Instead of having to update all of the records, you could now use an available number between the existing two and save yourself the headache. For example:

AAA -> 10
BBB -> 20
CCC -> 30

We want to move CCC to be the second item (in between AAA and BBB), so we change 30 to 15. So now:

AAA -> 10
CCC -> 15
BBB -> 20

Of course, you'll still have to re-assign all the priorities if the user changes too many of the items, as you'll eventually run out of empty numbers. When you're doing your selection, just remember to order by the priority.

Best Answer

Related Solutions

Optimised Schema for City-to-City Distance Table

Schema design for storing item combination with order

Related Question