Mysql – one to many relationship DB structure

foreign keyMySQL

I have a table of millions of records (user contribution). sometimes, our content team find duplicate content (as it is almost the same context, this represents about 0.1%) and they asked to make redirect to the original content/article.
Now:

  1. I can add a column to the same table to point to the original article to be redirected to
create table article (id int primary key auto_increment, body text,
title varchar(128), ....., redirect_to int, 
FOREIGN KEY (`redirect_to`) REFERENCES article (id));
  1. Or to have this 1-M relationship in separate table as below
create table article (id int primary key auto_increment, body text,
title varchar(128), .....,);

create table article_redirect (article_id int, redirect_to int, 
FOREIGN KEY (`article_id`) REFERENCES article (id),
FOREIGN KEY (`redirect_to`) REFERENCES article (id));

From Database Administration concepts and standards, which option is better and why?

Best Answer

Even though only 0.1% of the articles have a duplicate, the cost (in terms of wasted space) of a NULL int value (for a redirect_to column) in the article table is small. (In fact, if using the InnoDB storage engine, you will waste no space at all if the value is NULL. (See e.g. Bill Karwin's SO answer here as well as the MySQL documentation about InnoDB field contents.) However, if the redirect_to column needs to be indexed, then there is a performance penalty.

A separate table (article_redirect) is the more normalized approach. This will add another join to your queries, but again the cost of that (in terms of extra memory and processing time) is small because the table is small, especially if best-practice indexing is used. You should therefore have a compound index (article_id, redirect_to) on the article_redirect table.

For the separate table approach (article_redirect), it's good practice to define an explicit primary key (in every table). Assuming an article can only be a duplicate of one other article (but many articles can duplicate the same article), then the primary key should be article_id.

As an aside, I will mention that it's usually not good practice to have TEXT columns (TEXT is a type of BLOB) together with other "meta data" columns such as title, timestamps and so forth that you might retrieve in queries. Instead, consider moving this to a separate table with a foreign key pointing back to your article table. See the MySQL documentation about optimizing for BLOB types for details.