Mysql – Duplicate data or Relationships

database-designdenormalizationMySQL

Given two tables:

Product

int    id
int    operator_id
string name
string description
int    price

Operator

int    id
string logo_path
string tos_path
int    tax_number

I need to call multiple Product a few times. Most of the times I call a Product, I need to get the Operator's name and logo.

I know that we should avoid duplicating data when possible, and be using more elegant ways to bind data, but in this case, isn't it more efficient to put int operator_name and int operator_logo into Product table instead of calling at each request, one more request to find the Operator?

Or is it the really point of relationships, DBs being OK with dealing with one more request instead of duplicating the data?

A bit beginner into optimization and not really knowing what and where to search into to find answers, thanks for yours.

Best Answer

Let's assume your tables are defined like this (by converting your descriptions to SQL Data Definition Language - DDL):

CREATE TABLE operator
(
    id integer NOT NULL PRIMARY KEY,
    logo_path varchar(255),
    tos_path varchar(255),
    tax_number integer
) ;

CREATE TABLE product
(
    id integer NOT NULL PRIMARY KEY,
    operator_id integer NOT NULL REFERENCES operator(id),  /* This is actually ignored by MySQL, but not by "well-behaved" databases */
    name varchar(100),
    description varchar(255),
    price decimal(12,2)
) ;

Let's put some sample data:

INSERT INTO 
    operator
    (id, logo_path, tos_path, tax_number)
VALUES
    (1000, '/path/to/logo/1000', '/path/to/tos/1000', 1234),
    (1001, '/path/to/logo/1001', '/path/to/tos/1001', 5678) 
;

INSERT INTO
    product
    (id, operator_id, name, description, price)
VALUES
    (1, 1000, 'Product Name', 'Product Description', 1234.56),
    (2, 1001, 'Product 2', 'Description 2', 2345.67)
;

... and now we can perform a SELECT with a JOIN

SELECT
    product.id, product.name, product.price, operator.logo_path,
    operator.tos_path, operator.tax_number
FROM
    product
    JOIN operator ON operator.id = product.operator_id ;

This is the result you would get, you retrieve the data from both product and operator with just one query. The database will handle how to fetch it from the tables. One of the objectives of relational databases is to just let them do this kind of things

id | name         |   price | logo_path          | tos_path          | tax_number
-: | :----------- | ------: | :----------------- | :---------------- | ---------:
 1 | Product Name | 1234.56 | /path/to/logo/1000 | /path/to/tos/1000 |       1234
 2 | Product 2    | 2345.67 | /path/to/logo/1001 | /path/to/tos/1001 |       5678

You can see the full example to play with at dbfiddle here

You always want to use relationships, and JOIN, and not repeat. This is formally called Normalizing your database

Of course, all rules tend to have exceptions. Having denormalized data is left to a few specific cases (normally, data which is read only, and where a need for speed is of the utmost concern; this is typical of data warehouses and for analytics (OLAP)).

If in doubt: normalize. Always.

Related Solutions

Product Attribute List Design Pattern in MySQL

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Mysql – Storing metadata of various data types in a MySQL database

What I have done in the past is to use something like this:

1) Each document has a logical type associated with it.

2) Set up tables for metadata for each logical type. Each row can store all metadata associated with the document.

The other approach is that of key-value-modelling which can actually be ok or not depending on what you are doing with it. In this case you have a metadata table which stores all metadata, one value per record, for all documents. This works best if you find a reasonable way to aggregate the data in result sets, and if you aren't doing complex searches across multiple metadata fields.