Mysql – Duplicate data or Relationships

database-designdenormalizationMySQL

Given two tables:

Product

int    id
int    operator_id
string name
string description
int    price

Operator

int    id
string logo_path
string tos_path
int    tax_number

I need to call multiple Product a few times. Most of the times I call a Product, I need to get the Operator's name and logo.

I know that we should avoid duplicating data when possible, and be using more elegant ways to bind data, but in this case, isn't it more efficient to put int operator_name and int operator_logo into Product table instead of calling at each request, one more request to find the Operator?

Or is it the really point of relationships, DBs being OK with dealing with one more request instead of duplicating the data?

A bit beginner into optimization and not really knowing what and where to search into to find answers, thanks for yours.

Best Answer

Let's assume your tables are defined like this (by converting your descriptions to SQL Data Definition Language - DDL):

CREATE TABLE operator
(
    id integer NOT NULL PRIMARY KEY,
    logo_path varchar(255),
    tos_path varchar(255),
    tax_number integer
) ;

CREATE TABLE product
(
    id integer NOT NULL PRIMARY KEY,
    operator_id integer NOT NULL REFERENCES operator(id),  /* This is actually ignored by MySQL, but not by "well-behaved" databases */
    name varchar(100),
    description varchar(255),
    price decimal(12,2)
) ;

Let's put some sample data:

INSERT INTO 
    operator
    (id, logo_path, tos_path, tax_number)
VALUES
    (1000, '/path/to/logo/1000', '/path/to/tos/1000', 1234),
    (1001, '/path/to/logo/1001', '/path/to/tos/1001', 5678) 
;

INSERT INTO
    product
    (id, operator_id, name, description, price)
VALUES
    (1, 1000, 'Product Name', 'Product Description', 1234.56),
    (2, 1001, 'Product 2', 'Description 2', 2345.67)
;

... and now we can perform a SELECT with a JOIN

SELECT
    product.id, product.name, product.price, operator.logo_path,
    operator.tos_path, operator.tax_number
FROM
    product
    JOIN operator ON operator.id = product.operator_id ;

This is the result you would get, you retrieve the data from both product and operator with just one query. The database will handle how to fetch it from the tables. One of the objectives of relational databases is to just let them do this kind of things

id | name         |   price | logo_path          | tos_path          | tax_number
-: | :----------- | ------: | :----------------- | :---------------- | ---------:
 1 | Product Name | 1234.56 | /path/to/logo/1000 | /path/to/tos/1000 |       1234
 2 | Product 2    | 2345.67 | /path/to/logo/1001 | /path/to/tos/1001 |       5678

You can see the full example to play with at dbfiddle here


You always want to use relationships, and JOIN, and not repeat. This is formally called Normalizing your database

Of course, all rules tend to have exceptions. Having denormalized data is left to a few specific cases (normally, data which is read only, and where a need for speed is of the utmost concern; this is typical of data warehouses and for analytics (OLAP)).

If in doubt: normalize. Always.