We have a database with versioned data and we are trying to find out a faster to query data.
There is a Model
table and a few child tables (eg: Template
Table) having references to Model
. When there is a change in a Child table, it creates a new Model
version and creates a new record in the child table.
In the following example there are 3 changes for the First
Model ( so 3 versions). In the Template
Table there are 4 different templates belongs to the First
model.
Model Table
+----+--------+-----------+
| ID | Name | VersionID |
+----+--------+-----------+
| 1 | First | 1 |
+----+--------+-----------+
| 2 | Second | 1 |
+----+--------+-----------+
| 3 | Third | 1 |
+----+--------+-----------+
| 1 | First | 2 |
+----+--------+-----------+
| 1 | First | 3 |
+----+--------+-----------+
When there is template change for a model, it creates a new record in the Template table just for that Template.
Template Table
+----+---------+----------------+------+
| ID | ModelID | ModelVersionID | Data |
+----+---------+----------------+------+
| 1 | 1 | 1 | xxx |
+----+---------+----------------+------+
| 2 | 1 | 1 | xxx |
+----+---------+----------------+------+
| 3 | 1 | 1 | xxx |
+----+---------+----------------+------+
| 4 | 1 | 1 | xxx |
+----+---------+----------------+------+
| 5 | 2 | 1 | xxx |
+----+---------+----------------+------+
| 6 | 3 | 1 | xxx |
+----+---------+----------------+------+
| 1 | 1 | 2 | xxx |
+----+---------+----------------+------+
| 2 | 1 | 2 | xxx |
+----+---------+----------------+------+
| 6 | 1 | 2 | xxx |
+----+---------+----------------+------+
| 1 | 1 | 3 | xxx |
+----+---------+----------------+------+
When I want to load the latest Templates for model version 3. I have to write a query to get the latest version of Template for ModelVersionId <= 3
(I am not sure how to write that query properly in a single sql statement)
Following templates should be the result for Model version 3.
+----+---------+----------------+------+
| ID | ModelID | ModelVersionID | Data |
+----+---------+----------------+------+
| 3 | 1 | 1 | xxx |
+----+---------+----------------+------+
| 4 | 1 | 1 | xxx |
+----+---------+----------------+------+
| 2 | 1 | 2 | xxx |
+----+---------+----------------+------+
| 6 | 1 | 2 | xxx |
+----+---------+----------------+------+
| 1 | 1 | 3 | xxx |
+----+---------+----------------+------+
This approach will be slow when there are 100s of versions for every Model
and thousands of records in the Template
table. So there is another approach we are considering.
Adding FromDateTime and ToDateTime in child tables.
Model Table
+----+--------+-----------+-----------------------|
| ID | Name | VersionID | CreatedDateTime |
+----+--------+-----------+-----------------------+
| 1 | First | 1 | ...... |
+----+--------+-----------+-----------------------+
| . | ..... | .... | ...... |
+----+--------+-----------+-----------------------+
| 1 | First | 3 |2018-04-18 15:07:28.920|
+----+--------+-----------+-----------------------+
Template Table
+----+---------+----------------+------+--------------+------------+
| ID | ModelID | ModelVersionID | Data | FromDateTime | ToDateTime |
+----+---------+----------------+------+--------------+------------+
| 1 | 1 | 1 | xxx | | |
+----+---------+----------------+------+--------------+------------+
| . | ..... | ..... | ... | ...... | ...... |
+----+---------+----------------+------+--------------+------------+
ToDateTime is NULL
for the current(latest) record.
So, if I want to load the latest templates for the latest model version. It is easy, just load the Templates for ModelId
1 where ToDateTime
is NULL
.
When I want to load templates for an old Model version. The query will be
FromDateTime >= CreatedDateTime AND ToDateTime <= CreatedDateTime
If we index FromDateTime
and ToDateTime
, above query should be much fast than the previous case. Before going ahead with that direction, I am wondering if there are other common patterns to solve this type of versioning?
Best Answer
For your date ranges, always use the max collating value (typically 9999-12-31) for the end time to represent "current" or "until further notice". This means that your queries for finding a record that is current at a particular point in time (including
GETDATE()
) will be consistent.See also my answer to a similar question on Stack Overflow, with additional details and tips.