Mysql – Table design – one table or hundreds of small ones

database-designMySQL

I have the following problem:

I have measuring devices, each one measure always the same quantities from the list of five different ones. Each device can measure different quantities from this list. I update the data only sometimes (once a month), but read them online based on time frames (no caching on app side).
Every device has aproximatelly 20K entries up to date.
New device will be added only in special cases, so devices can be seen as fixed.

My question is:

Should I have one big table or multiple (hunderds) of small ones.

Advantages / Disadvantages I see:

Single table

-) waste of space, since some columns are not filled, because device not measure certain quantity

+) safer select – device ID is not part of table name and can be passed safely with PDO or other library to prevent injections

+) easier to select multiple devices at once (however, this is not an usuall user case)

Multiple tables

+) "space" friendly

-) more complicated select – table name must be checked manually (from schema table) before select to prevent injections

Best Answer

Another solution is one table per quantity.

+-----------+------------+-----------+
| device ID | date       | quantity1 |
+-----------+------------+-----------+
|         A | 2020/06/03 |        10 |
+-----------+------------+-----------+

+-----------+------------+-----------+
| device ID | date       | quantity2 |
+-----------+------------+-----------+
|         B | 2020/06/04 |        50 |
+-----------+------------+-----------+

+-----------+------------+-----------+
| device ID | date       | quantity3 |
+-----------+------------+-----------+
|         A | 2020/06/03 |       100 |
+-----------+------------+-----------+
|         B | 2020/06/04 |       100 |
+-----------+------------+-----------+

The wide table with NULLs can be re-created by joining these tables. Joseph's tall table can be created by unioning them.

In terms of storage, NULLs are quite efficiently represented. They need not take up the full byte count of the column's declared data type. Don't be afraid of storing NULLs.

If space overhead is a concern then likely the one-column-per-quantity will have the best key-to-data ratio, followed by table-per-quantity then the single table.

If you're using compressing or columnar storage this can be more disk-efficient, too, but depends on the storage engine. Adjust the table design and data sorting to best suit the engine.

My suggestion is that you think about the use you will make of the data and consider how queries will be written for each design. For a given device & date do you need several quantities or one only? Each design can support all queries, just that some designs make the queries simpler to write and faster to run.

Whatever the table design additional consideration must be given for reporting on devices that recorded no quantity on a given day or days where a device is absent.

Related Solutions

Mysql – One Big Database vs. Several Smaller Ones

I ran a system with the best part of a thousand databases, spread across multiple servers. They were all an identical structure and were synchronised with a template database which was on each of the machines.

This allowed me the ability to migrate databases from one db to another if one was getting excessively over-loaded, and as the client mix changed, I could create new databases on different servers to load balance across the servers. This was the biggest advantage I got from the system, in that I had multiple large lumps of tin performing multiple complicated queries simultaneously on the separate servers.

The great thing about this, is that you can add servers to the configuration at your own speed, as each server starts to get over-loaded, add another into the mix, migrate some dbs across to the new server and end up with a nicely load balanced set of servers. A really nice and simple way to add scale to the system as and when it is required!

The reason I went with this approach rather than the single huge database approach, was the sheer size of the potential database that would have been created... each of the 1000 databases had 200 tables, and many of the individual tables within each of the databases comprised many hundreds of millions of rows of data!

A single database configuration would have required certain tables (approx 8 of them) to have multi-billions of rows of data, and the total db size would have been over 10Tb. We were able to have multiple servers with 5Tb of RAID 10 storage, with many databases on each.

That's what I would do! Hope it helps your decision making... :)

Mysql – Product Attribute List Design Pattern

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Best Answer

Related Solutions

Mysql – One Big Database vs. Several Smaller Ones

Mysql – Product Attribute List Design Pattern

Related Question