Mysql – Table structure for entity versioning

database-designMySQL

For an assignment I have to build an application that is able to store tests. Every test has to be validated by a a different user when created or updated. When it has been validated by an user, the test gets an expiry date of the last time it was validated plus 2 years.

Everytime a test gets updated, it creates a new version: v1, v2, v3, etc. This versioning information is stored in the test_versions table.

The tests itself are all stored in the same tests table with a foreign key to the test_versions table (version_id)table. Thetest` table consists of multiple many-to-many relations with joining-tables. For the sake of simplicity, I didn't include these in my table definition.

create table tests
(
    id                  int auto_increment   primary key,
    name                varchar(255)         not null,
    description         varchar(2048)        not null,
    version_id          int                  not null,
    is_active           tinyint(1) default 1 null comment,
    created_at          timestamp            null,
    updated_at          timestamp            null
)
    collate = utf8_unicode_ci;

Every new test is made with a new revision. The tests table has an version_id column. This is referring to the test_versions table which has the following columns:

version_id
description
original_version_id (The test where this version is made off)
tag (v1, v2, v3, etc.)
is_current_version (An indicator if it's the test's current version)
expiry_date (The expiry date of the current version, after this date, the test would not be showing).

It's possible to have unlimited versions for every test.

Now the problem is that when a new test is created, a new version should also be created in the test_versions table. This is not possible in the current structure since every new record in the test_versions table needs an test_id.

Dropping, or not working with foreign keys would not be an option.

Questions

How can I create the table structure in a way that I can still have unlimited versions without having the foreign key problems? How does the StackExchange site do it for example?
In my structure, every new test needs a version. Can this be done any better or is it efficient to split data over multiple tables?
Could there be another table structure for not working with these FK's in both directions?
Is it efficiënt to store more versions in the same tests table? Is there any other/better way?

Best Answer

My preference is to handle this with a stored procedure rather than a trigger, but that's an option if you have a lot of legacy code running unwrapped INSERTs.

DELIMITER //
CREATE PROCEDURE CreateTest (Name VARCHAR(255), Description VARCHAR(2048))
    BEGIN

    DECLARE EXIT HANDLER FOR SQLEXCEPTION
        BEGIN
        ROLLBACK;
        END;

    START TRANSACTION;

    INSERT INTO test (name, description, version_id) VALUES (Name, Description, 0);
    DECLARE TestID INT;
    SET TestID = LAST_INSERT_ID();

    INSERT INTO test_versions (test_id) VALUES (TestID);
    DECLARE TestVersionID INT;
    SET TestVersionID = LAST_INSERT_ID();

    UPDATE test SET version_id = TestVersionID WHERE id = TestID;

    COMMIT;
    END//
DELIMITER ;

Apologies if I screwed up the MySQL syntax, I've been working in MS SQL 90% of the time for the past ten years.

Note that this code temporarily sets the value of version_id to zero, to satisfy the FK. You'd need a dummy record in test_versions with such an ID. However, the field is updated before the transaction commits, so no queries should ever see this value.

Edit: if you are really concerned about the INSERT performance of this sproc, you could simplify it by giving test_version a composite PK on test_id and version_id, where version_id is a sequence (starting at 1) for each test. You would need to retrieve only one AUTO_INCREMENT value:

DELIMITER //
CREATE PROCEDURE CreateTest (Name VARCHAR(255), Description VARCHAR(2048))
    BEGIN

    DECLARE EXIT HANDLER FOR SQLEXCEPTION
        BEGIN
        ROLLBACK;
        END;

    START TRANSACTION;

    INSERT INTO test (name, description, version_id) VALUES (Name, Description, 1);
    DECLARE TestID INT;
    SET TestID = LAST_INSERT_ID();

    INSERT INTO test_versions (test_id, version_id) VALUES (TestID, 1);

    COMMIT;
    END//
DELIMITER ;

As you can see, version_id can be assumed to be 1 for the first record, so there's no need to UPDATE test. You also don't need a dummy record #0 in test_version. However, the broader PK would make indices on test_version slightly bigger. Using a narrow integer for version_id would help.

Related Solutions

Mysql – Best structure for user-level versioning of data in thesql

It is possible to use one-database-per-user or one-table-per-user models… But these will start to become unwieldy as the number of users in your application grows… MySQL (and, really, no RDBMS) handles hundreds of thousands of tables or databases very well.

Having an user_id (or similar) column on each table that contains user data is totally standard, and the "right" way to do what you're describing.

Mysql – Finding rows for a specified date range

Another way to get the result is this. It finds first all groups that the teacher has surely taught (or is going to) by checking that she has started within the month and then in another subquery it finds - for every group - the last teacher that started at the first day of the month or earlier.

With the unique index you have on the table, the second subquery should be quite efficient. The first subquery would benefit from an index on (teacherid, startdate, groupid):

SELECT groupid
FROM pupilgroupteacher
WHERE teacherid = @teacher 
  AND startdate >= @month + INTERVAL 1 DAY
  AND startdate < @month + INTERVAL 1 MONTH

UNION DISTINCT

SELECT gg.groupid
FROM 
    ( SELECT DISTINCT groupid
      FROM pupilgroupteacher
    ) AS gd
  JOIN pupilgroupteacher AS gg
    ON  gg.groupid = gd.groupid
    AND gg.startdate =
        ( SELECT MAX(gi.startdate)
          FROM pupilgroupteacher AS gi 
          WHERE gi.groupid = gd.groupid
            AND gi.startdate < @month + INTERVAL 1 DAY
        )
WHERE gg.teacherid = @teacher ;

Best Answer

Related Solutions

Mysql – Best structure for user-level versioning of data in thesql

Mysql – Finding rows for a specified date range

Related Question