Mysql – Relational modeled design to store versioned files

data-versioningforeign keyMySQLschema

The scenario

I have images in a database table and some of these need to be altered then re-saved(not overwriting the original) whilst providing a reference back to the original. I also save the directory path which the images are stored, if a original image is stored in dir : 1, the altered version of this exact image would be dir : 1-1.(This I am thinking has room for improvement but I am unsure of anyway of doing it(BLOB and storing within the db is out of the question).

Current model

 Files
+---------+-----------+-----------+--------+
| file_id | file_name | file_hash | folder |
+---------+-----------+-----------+--------+
|    1    |  bmw.png  | d29djkj38 |    1   |
|    2    |  bmw.jpeg | e9sljd551 |    1   |
+---------+-----------+-----------+--------+

Altered_Files
+---------+------------------+
| file_id | original_file_id |
+---------+------------------+
|    2    |         1        |
+---------+------------------+

file_id and original_file_id here are both foreign keys to Table : Files Column : file_id.

file_id is also the Primary Key

I have never used two columns as foreign keys where both reference the same one column in another table, this makes me wonder if its a incorrect design(or perhaps its not even allowed architecturally).

As for the altered images directory folder, we can generate this by joining both tables and then using contact('-') with the results.

Overall concern

I am still new to Database modelling and wondering if this design is ok(won't need refactoring at a later date as the pros say(Edgar Codd, C Date)), perhaps it can be improved even more?

Also what do you suggest for the image directory issue, integrity is not enforced in the database but with this idea it seems possible to maintain with application code and permitting access to the main directory to only the db.

Best Answer

Your Altered_Files design is absolutely fine. In fact I would award bonus points for separating the files per se from the hierarchy. @chillworld's suggestion is workable and very common but yours is better because of the NULL issues you mention. I can't think of any update which is better in one schema or the other. Change a single row - either the file data or the files' relationship - and you have a single update on a single table. No risk to integrity there. Altered_files contains only primary keys (PK) and pointers to PKs. By definition these should never change. On the off chance you should have to change one then, yes, you have to be very, very careful to maintain consistency. For INSERTs and DELETEs wrap the statements in a transaction. This is precisely why transactions exist.

I would suggest you store the full directory path for an altered image, not just its new sub-folder. What happens if you derive image 4 from image 3, from image2, from image 1? Will contact('-') handle that? Will the continuous constructing of image 4's ancestry be a burden on the system? You know your data, workflow and hardware best. If the volumes are small this may not be an issue.

The file-in-database-or folder is an age-old one. SQL Server has functionality called FILESTREAM where the image passes through the database on the way to the file system. It is held on disk as a JPG or PNG or whatever and can be read by normal applications. The database retains ownership of it and its related data. The best of both worlds. I don't think MySQL has anything similar, however. I think you may want to grant other applications read access to the image filestore and reserve write access for the DB.