Mysql – What are the different ways to keep track of active and archived data

MySQL

I'm looking for different ways to keep track of both active and archived data so I can pro and con them.

The system:
I have a computer with a database on it. The database has several tables in it; one of which contains a list of users that can use the computer; and several tables for auditing (user 1 did this, user 2 did that, etc). This database is a slave of a master database in which a Content Management System is used to say, add a new user and see reports on what user did what.

Example:
As stated above, I have a table (lets call it users) that keeps track of all the users that are allowed to use the computer. As time goes by users will be added and removed. The problem is the audit tables keep track of a user ID so if the user is removed I lose the user information because the rows can't be joined. One idea I had was to use MySql's triggers so that if a user is added, an insert trigger is triggered and inserts a copy of the data to an 'archived' user table (lets call it users_archive). That way the computer can use users to determine if the user has permission to use it and reports can use users_archive for reports.

This seems like the easiest and most simple way to do it, but I can't find any other ways via google search to see if there are any other ways to do something like this.

Best Answer

The two options that spring to mind avoiding triggers are:

  • Flag the deleted rows as such with an extra field, and update all your existing code to deal with it. This is "simple" but will touch a lot of your code so can be a bug source. Delete is not delete, mearly "set is_achived flag".
  • Flag the deleted rows in the table, and have views for active and archived data. This avoids changing existing code as you can give the combined table a new name (users_all) and the view of active users (SELECT <fields> FROM users_all WHERE is_archived=0) the name of the old table. Code such as audit reports simple need to query users_all directly. You will need to run a few tests to make sure mysql's query planner handles this efficiently though - I've had it work well is MSSQL but don't use mysql myself so have no expectations one way or the other about it's QP's intelligence.
  • Separate tables for active and archived. The delete operation then becomes an insert into archived and a delete from active (be sure to wrap this in an explicit transaction so if there is an error half way through you don't lose records completely or end up duplicating them). For code that needs to see both subsets, either UNION directly in the views/procs or have a view that is a UNION of the two tables and use that. This is likely to be less efficient than the above for those code paths that do need to deal with both subsets, but will not affect other areas of the codebase so may be preferable. One key disadvantage with this option is not being able to use the two tables in FK relationships - you can get around this by adding a third table containing just the identifiers for each row, then all tables that contain links to users (including the active users and archived users tables) would use that as the FK source - the only extra work in your business logic as a result would be to create that row before creating the row in the users table.

Triggers can be very powerful, but you need to design them very carefully as they can be a performance problem if not, and of course silent things happening can be a problem when bugs to arise as it can be difficult to see everything that is going on at first glance. they can greatly simplify some processes though, creating easier to maintain structures as a result, and sometimes having a bit of structural logic in the DB is far more efficient than having the next layer up deal with it, so never discount them out of hand. In this instance though, I'd stick with one of the simple boolean flag options.