I ran a system with the best part of a thousand databases, spread across multiple servers. They were all an identical structure and were synchronised with a template database which was on each of the machines.
This allowed me the ability to migrate databases from one db to another if one was getting excessively over-loaded, and as the client mix changed, I could create new databases on different servers to load balance across the servers. This was the biggest advantage I got from the system, in that I had multiple large lumps of tin performing multiple complicated queries simultaneously on the separate servers.
The great thing about this, is that you can add servers to the configuration at your own speed, as each server starts to get over-loaded, add another into the mix, migrate some dbs across to the new server and end up with a nicely load balanced set of servers. A really nice and simple way to add scale to the system as and when it is required!
The reason I went with this approach rather than the single huge database approach, was the sheer size of the potential database that would have been created... each of the 1000 databases had 200 tables, and many of the individual tables within each of the databases comprised many hundreds of millions of rows of data!
A single database configuration would have required certain tables (approx 8 of them) to have multi-billions of rows of data, and the total db size would have been over 10Tb. We were able to have multiple servers with 5Tb of RAID 10 storage, with many databases on each.
That's what I would do! Hope it helps your decision making... :)
I personally would use a model similar to the following:
The product table would be pretty basic, your main product details:
create table product
(
part_number int, (PK)
name varchar(10),
price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);
Second the attribute table to store the each of the different attributes.
create table attribute
(
attributeid int, (PK)
attribute_name varchar(10),
attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');
Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.
create table product_attribute
(
part_number int, (FK)
attributeid int (FK)
);
insert into product_attribute values
(1, 1),
(1, 3),
(2, 6),
(2, 2),
(2, 6);
Depending on how you want to use the data you are looking at two joins:
select *
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid;
See SQL Fiddle with Demo. This returns data in the format:
PART_NUMBER | NAME | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1 | product1 | 50 | 1 | color | red
1 | product1 | 50 | 3 | material | chrome
2 | product2 | 96 | 6 | size | x-large
2 | product2 | 96 | 2 | color | blue
2 | product2 | 96 | 6 | size | x-large
But if you want to return the data in a PIVOT
format where you have one row with all of the attributes as columns, you can use CASE
statements with an aggregate:
SELECT p.part_number,
p.name,
p.price,
MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;
See SQL Fiddle with Demo. Data is returned in the format:
PART_NUMBER | NAME | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1 | product1 | 50 | red | chrome | null
2 | product2 | 96 | blue | null | x-large
As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):
SET @sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(a.attribute_name = ''',
attribute_name,
''', a.attribute_value, NULL)) AS ',
attribute_name
)
) INTO @sql
FROM attribute;
SET @sql = CONCAT('SELECT p.part_number
, p.name
, ', @sql, '
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid
GROUP BY p.part_number
, p.name');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.
Best Answer
Another solution is one table per quantity.
The wide table with NULLs can be re-created by joining these tables. Joseph's tall table can be created by unioning them.
In terms of storage, NULLs are quite efficiently represented. They need not take up the full byte count of the column's declared data type. Don't be afraid of storing NULLs.
If space overhead is a concern then likely the one-column-per-quantity will have the best key-to-data ratio, followed by table-per-quantity then the single table.
If you're using compressing or columnar storage this can be more disk-efficient, too, but depends on the storage engine. Adjust the table design and data sorting to best suit the engine.
My suggestion is that you think about the use you will make of the data and consider how queries will be written for each design. For a given device & date do you need several quantities or one only? Each design can support all queries, just that some designs make the queries simpler to write and faster to run.
Whatever the table design additional consideration must be given for reporting on devices that recorded no quantity on a given day or days where a device is absent.