I need some advice on how to store information about who created/updated an "entity". I keep going back and forth between Option1, 2 and 3 (see below):
Background
- This is a legacy system and there are currently two types of users: e.g. employees and customers.
- The users have different userId (saved in different tables…)
- The employee key is a char of 50 characters and the customerId is a char of 30 characters.
- Also, the employee is working at an office (might be several), so we want to track from which office the update/insert came from
- The office key is a char of 50 characters.
- The database is quite large: 200+ tables
Option 1
Store the "meta data" together with the entity.
Create table order(
orderId int not null primary key auto_increment
...
insertedByEmployee CHAR(50),
insertedFromOffice CHAR(50),
insertedByCustumer CHAR(30),
insertedDate dateTime,
updatedByEmployee CHAR(50),
updatedFromOffice CHAR(50),
updatedByCustomer CHAR(30),
updatedDate timestamp
);
The primary advantage of this method is that it is straight forward fo find the updated field. The downside is that there are redundant columns.
Option 2
In this scenario I've moved the "meta data" to a separate table. Adds the possibility to have history. Don't know how relevant history is in this case though.
Create table order(
orderId int not null primary key auto_increment
...
);
CREATE TABLE order_meta (
order_metaId INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
orderId INT,
customerId CHAR(30),
employeeId CHAR(50),
officeId CHAR(50),
action ENUM('create','update'),
dateTime` TIMESTAMP
);
What I like about this approach is that some redundancy is removed. Also, if we want to track "reads" we could simply updated the "action" column. This design makes it harder to find the last updated record though. Perhaps there should be updatedDate in the entity table. Also, I'm going to need one meta-table per entity, which i really don't like.
Option 3
Instead of using multiple columns for user id:s, create a "virtual user" and assign an id which will be used regardless of type.
A generic table (meta) is used to track changes to all entities. The downside is that this table might get used A LOT and makes querying for changes more cumbersome. I'm worried it might become a bottleneck.
This design doesn't require creating a separate table for each entity. Finding the actual user performing the update/insert is even more complicated than the other options. On the other hand I think this one might scale better.
Pseudo code (vuserId currently doesn't exist):
Create table order(
orderId int not null primary key auto_increment,
updated TIMESTAMP NOT NULL
...
);
CREATE TABLE virtual_user(
vuserId INT NOT NULL PRIMARY KEY AUTO_INCREMENT),
...
);
CREATE TABLE customer(
vuserId INT NOT NULL PRIMARY KEY,
customerId CHAR(30),
...
);
CREATE table employee(
vuserId INT NOT NULL PRIMARY KEY,
employeeId char(50)
...
);
CREATE TABLE meta (
metaId INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
entityName VARCHAR(30),
entityId VARCHAR(50),
vuserId INT,
officeId CHAR(50),
action ENUM('create','update'),
dateTime TIMESTAMP NOT NULL
);
Best Answer
Option 1 is a no; this level of un-normalized data will mean many extra data pages occupying more space on disk, more data pages having to be read/written on operations, ...
Between Option 2 and Option 3 I'd go with something closer to Option 3 - In here you will save many bytes related to users (an
Int
goes for 4 bytes instead of the 30/50 of theChars
). This will save you again considerable disk space (considering users appear more than once) and get you smaller indexes which will mean better performance.Now, on Option 3, I might be missing something for I don't understand what entityName and entityId stand for. If this is user information, it will better stay on the
virtual_user
table so you won't get this info repeated in every meta. Also, on themeta
table I'd keep theorderId
, it will allow you proper and fast joining of themeta
andorder
tables, providing adequate indexing is present. I don't know the specifics of your requirements but consider if theupdated
column is necessary on theOrder
table or if for displaying the desired info on the application you'll be joining with the latest registry from themeta
table all of the time.If you want to keep track of order history, depending on requirements you might want to consider having a separate table (
orderHistory
or 'orderChanges'), or partitioning theorder
table. Now, especially for this last case, theorder
table could have a composite key composed by(orderId, metaId)
, making it Unique and a candidate index for joining with themeta
table.-- After the explaining comment: The
meta
table will be the most heavily used one, so keep everything as short as possible on this one:Clustered index on
metaId
Add a table
entityTypes
which relatesentityName CHAR
toentityType int
. This will also get you a nice list of entities for anyone new looking at the database to know which entities relate to metaTry having
entityId
asint
for all entities (If you can't change/add column to the existing tables, add an intermediary table) - This will reduce the field to 4bytes and allow one single data type for all entities.Add an index to
meta
table onentityType
(now int) andentityId
(now int). Your rows on this heavily used table are now so short you should be doing everything lightning fast.