How to design a database that easily lets me recreate historical snapshots

database-design

I'm designing a database for an accounting system. In the system there are companies, and each company has many contracts.

Each contract gives a company access to products, data types, or both. Clients use the different products to access different data types.

The initial contract may give the company Products A and B with access to data types X and Y. A second contract may come along and give the company access to data type Z within Product A. And so on.

It is important for the users of this system to be able to move through the contracts and see the state of the company's products and data type access, as of different contracts.

I am trying to think of a good approach to the architecture, and I've come up with two options:

  1. Store the latest state of a company's products and data types in a products and data_types table. These would belong directly to the company. Each new contract would update these tables appropriately. So, querying the current state of a company's access would be a simple SQL query.

    Then, in a separate table, I could store the contracts, and each contract would have many contract_items. This is where I could record which product or data type the contract is adding, updating or removing.

    I could then write the logic in my software layer to recreate historical snapshots, using this contract_items table. I would simply pull all the items for a company up to the selected contract, then use the items to build up the state of the company as of that contract.

  2. Store a snapshot of the company as of each contract. I am actually having trouble thinking through how I could do this in the database layer. One way would be to create a new rows in the companies, products and data_types tables for each contract.

    This would make querying the state of the company as of any particular contract a simple SQL query. I would just select out of those three tables where contract_id equaled the current selected id. However, it seems like a lot of data duplication. Perhaps this is not a problem.

Any thoughts on this? Are there best practices in this area, or something I'm missing? Thanks in advance.

Best Answer

Versioning. Not a new problem - used thousands of times per day by developers using any version control system.

Mark any item with 2 fields - ValidFromNo and ValidToNo. The last can be 0.

Keep a table versions that has the version numbers. ValidFromNo is from which version number an item is valid, ValidToNo - which can be null - is the last valid version. Null marks "Currently in use" so you do not have to update all valid items on every new versin.

Simple like that.