I have a table of millions of records (user contribution). sometimes, our content team find duplicate content (as it is almost the same context, this represents about 0.1%) and they asked to make redirect to the original content/article.
Now:
- I can add a column to the same table to point to the original article to be redirected to
create table article (id int primary key auto_increment, body text, title varchar(128), ....., redirect_to int, FOREIGN KEY (`redirect_to`) REFERENCES article (id));
- Or to have this 1-M relationship in separate table as below
create table article (id int primary key auto_increment, body text, title varchar(128), .....,); create table article_redirect (article_id int, redirect_to int, FOREIGN KEY (`article_id`) REFERENCES article (id), FOREIGN KEY (`redirect_to`) REFERENCES article (id));
From Database Administration concepts and standards, which option is better and why?
Best Answer
Even though only 0.1% of the articles have a duplicate, the cost (in terms of wasted space) of a NULL int value (for a
redirect_to
column) in thearticle
table is small. (In fact, if using the InnoDB storage engine, you will waste no space at all if the value is NULL. (See e.g. Bill Karwin's SO answer here as well as the MySQL documentation about InnoDB field contents.) However, if theredirect_to
column needs to be indexed, then there is a performance penalty.A separate table (
article_redirect
) is the more normalized approach. This will add another join to your queries, but again the cost of that (in terms of extra memory and processing time) is small because the table is small, especially if best-practice indexing is used. You should therefore have a compound index (article_id, redirect_to) on the article_redirect table.For the separate table approach (
article_redirect
), it's good practice to define an explicit primary key (in every table). Assuming an article can only be a duplicate of one other article (but many articles can duplicate the same article), then the primary key should bearticle_id
.As an aside, I will mention that it's usually not good practice to have
TEXT
columns (TEXT
is a type ofBLOB
) together with other "meta data" columns such as title, timestamps and so forth that you might retrieve in queries. Instead, consider moving this to a separate table with a foreign key pointing back to your article table. See the MySQL documentation about optimizing for BLOB types for details.