MySQL – How to Track Entity Updates (By Who and When)

MySQL

I need some advice on how to store information about who created/updated an "entity". I keep going back and forth between Option1, 2 and 3 (see below):

Background

  • This is a legacy system and there are currently two types of users: e.g. employees and customers.
  • The users have different userId (saved in different tables…)
  • The employee key is a char of 50 characters and the customerId is a char of 30 characters.
  • Also, the employee is working at an office (might be several), so we want to track from which office the update/insert came from
  • The office key is a char of 50 characters.
  • The database is quite large: 200+ tables

Option 1

Store the "meta data" together with the entity.

Create table order(
    orderId int not null primary key auto_increment
    ...
    insertedByEmployee CHAR(50),
    insertedFromOffice CHAR(50),
    insertedByCustumer CHAR(30),
    insertedDate dateTime,
    updatedByEmployee CHAR(50),
    updatedFromOffice CHAR(50),
    updatedByCustomer CHAR(30),
    updatedDate timestamp
);

The primary advantage of this method is that it is straight forward fo find the updated field. The downside is that there are redundant columns.

Option 2

In this scenario I've moved the "meta data" to a separate table. Adds the possibility to have history. Don't know how relevant history is in this case though.

Create table order(
    orderId int not null primary key auto_increment
    ...
);

CREATE TABLE order_meta (
    order_metaId INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
    orderId INT,
    customerId CHAR(30),
    employeeId CHAR(50),
    officeId CHAR(50),
    action ENUM('create','update'),
    dateTime` TIMESTAMP
);

What I like about this approach is that some redundancy is removed. Also, if we want to track "reads" we could simply updated the "action" column. This design makes it harder to find the last updated record though. Perhaps there should be updatedDate in the entity table. Also, I'm going to need one meta-table per entity, which i really don't like.

Option 3

Instead of using multiple columns for user id:s, create a "virtual user" and assign an id which will be used regardless of type.

A generic table (meta) is used to track changes to all entities. The downside is that this table might get used A LOT and makes querying for changes more cumbersome. I'm worried it might become a bottleneck.

This design doesn't require creating a separate table for each entity. Finding the actual user performing the update/insert is even more complicated than the other options. On the other hand I think this one might scale better.

Pseudo code (vuserId currently doesn't exist):

Create table order(
    orderId int not null primary key auto_increment,
    updated TIMESTAMP NOT NULL
    ...
);

CREATE TABLE virtual_user(
    vuserId INT NOT NULL PRIMARY KEY AUTO_INCREMENT),
    ...
);

CREATE TABLE customer(
    vuserId INT NOT NULL PRIMARY KEY,
    customerId CHAR(30),
    ...
);

CREATE table employee(
    vuserId INT NOT NULL PRIMARY KEY,
    employeeId char(50)
    ...
);
    
CREATE TABLE meta (
  metaId INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  entityName VARCHAR(30),
  entityId VARCHAR(50),
  vuserId INT,
  officeId CHAR(50),
  action ENUM('create','update'),
  dateTime TIMESTAMP NOT NULL
);

Best Answer

Option 1 is a no; this level of un-normalized data will mean many extra data pages occupying more space on disk, more data pages having to be read/written on operations, ...

Between Option 2 and Option 3 I'd go with something closer to Option 3 - In here you will save many bytes related to users (an Int goes for 4 bytes instead of the 30/50 of the Chars). This will save you again considerable disk space (considering users appear more than once) and get you smaller indexes which will mean better performance.

Now, on Option 3, I might be missing something for I don't understand what entityName and entityId stand for. If this is user information, it will better stay on the virtual_user table so you won't get this info repeated in every meta. Also, on the meta table I'd keep the orderId, it will allow you proper and fast joining of the meta and order tables, providing adequate indexing is present. I don't know the specifics of your requirements but consider if the updated column is necessary on the Order table or if for displaying the desired info on the application you'll be joining with the latest registry from the meta table all of the time.

If you want to keep track of order history, depending on requirements you might want to consider having a separate table (orderHistory or 'orderChanges'), or partitioning the order table. Now, especially for this last case, the order table could have a composite key composed by (orderId, metaId), making it Unique and a candidate index for joining with the meta table.

-- After the explaining comment: The meta table will be the most heavily used one, so keep everything as short as possible on this one:

  1. Clustered index on metaId

  2. Add a table entityTypes which relates entityName CHAR to entityType int. This will also get you a nice list of entities for anyone new looking at the database to know which entities relate to meta

  3. Try having entityId as int for all entities (If you can't change/add column to the existing tables, add an intermediary table) - This will reduce the field to 4bytes and allow one single data type for all entities.

  4. Add an index to meta table on entityType (now int) and entityId (now int). Your rows on this heavily used table are now so short you should be doing everything lightning fast.