Mysql – avoiding unique key collisions across tables

data-warehousedatabase-designMySQLmysql-5.5

This might be related to the idea of data warehousing, but i'm not sure, though it is extreme theory.

We have a table where records get written to first, let's call it tbl_recording which looks like:

CREATE TABLE tbl_recording (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NULL,
registration VARCHAR(45) NULL,
legacyid VARCHAR(45) NOT NULL,
UNIQUE INDEX legacyid_UNIQUE (legacyid ASC),
PRIMARY KEY (id));

Then, we have yearly tables based off of tbl_recording so like tbl_recording_2012, tbl_recording_2013 etc etc, their structure is the same.

We also have tbl_recording data off to the yearly table every hour (insert to yearly, delete from recording). Now, the problem I see is that the uniqueness of legacyid is not remembered across all these tables. So, while we generate a hash for the legacyid, there could be a collision (even though terribly unlikely) at some point:

  1. After data is removed from tbl_recording the same hash could be generated and re-entered into tbl_recording, thus causing a blockage further down the line when trying to insert into the yearly table
  2. Between years, the same hash could be generated for 2013 and 2014 EDIT: This would never happen if i append the year to the hash

The best way I've come up with to mitigate against this is to keep a hashtable, one that stores all hashes created, and is checked against before a record is inserted, and then that fresh hash inserted into the hashtable.

Are there better solutions to this or is that the only real solution?

edit: I've created an SQL Fiddle of an example of how my system works

http://sqlfiddle.com/#!2/231fa/1

pretend that: tbl_data_2012 will never be written to again. tbl_recording is constantly being written to but every hour or so, it's the contents it's being emptied to tbl_data_2013. tbl_recording has already dumped it's contents to tbl_data_2013 so when the 4th record was written there was nothing in there previously, thus it has created a new legacyid, however that legacyid already exists in tbl_data_2013

Best Answer

You should separate the table on year to get faster insert's, updates or deletes because we expect lots lots lots lots lots lots off records in this table..

**But... your problem is precisely the reason why separating tables on meta data (year in your question) is an SQL anti pattern, you could not have giving us an better use case*

If you want an better solution you should read about mysql partitioning see http://dev.mysql.com/doc/refman/5.5/en/partitioning.html