I personally would use a model similar to the following:
The product table would be pretty basic, your main product details:
create table product
(
part_number int, (PK)
name varchar(10),
price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);
Second the attribute table to store the each of the different attributes.
create table attribute
(
attributeid int, (PK)
attribute_name varchar(10),
attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');
Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.
create table product_attribute
(
part_number int, (FK)
attributeid int (FK)
);
insert into product_attribute values
(1, 1),
(1, 3),
(2, 6),
(2, 2),
(2, 6);
Depending on how you want to use the data you are looking at two joins:
select *
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid;
See SQL Fiddle with Demo. This returns data in the format:
PART_NUMBER | NAME | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1 | product1 | 50 | 1 | color | red
1 | product1 | 50 | 3 | material | chrome
2 | product2 | 96 | 6 | size | x-large
2 | product2 | 96 | 2 | color | blue
2 | product2 | 96 | 6 | size | x-large
But if you want to return the data in a PIVOT
format where you have one row with all of the attributes as columns, you can use CASE
statements with an aggregate:
SELECT p.part_number,
p.name,
p.price,
MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;
See SQL Fiddle with Demo. Data is returned in the format:
PART_NUMBER | NAME | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1 | product1 | 50 | red | chrome | null
2 | product2 | 96 | blue | null | x-large
As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):
SET @sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(a.attribute_name = ''',
attribute_name,
''', a.attribute_value, NULL)) AS ',
attribute_name
)
) INTO @sql
FROM attribute;
SET @sql = CONCAT('SELECT p.part_number
, p.name
, ', @sql, '
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid
GROUP BY p.part_number
, p.name');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.
The RENAME TABLE
syntax allows for atomic, multiple renames. See as follows:
CREATE TABLE myschema.mytable_new LIKE myschema.mytable;
RENAME TABLE myschema.mytable TO myschema.mytable_old, myschema.mytable_new to myschema.mytable;
DROP TABLE myschema.mytable_old;
In the above it is impossible for your code to access the table during the RENAME
operation, and this it cannot happen that the code "does not find the table". It just blocks until RENAME
operation completes, and the table is always there.
Also worth noting that unless using Percona Server with "fast DROP TABLE patch", DROP
ping a table might hold locks on your database (namely the table cache
) for the duration of the operation. And the operation may take a while since it involves evacuating table pages from InnoDB's cache and deleting the table file(s) from the filesystem.
Waiting for some time (eg a couple hours) after the RENAME
operation can do good since your old-table's pages are more likely to be flushed out by that time, reducing the overall time from DROP
ping the table.
Best Answer
It's 4 statements:
The
LOAD
is very fast for getting the TSV/CSV into a table.staging
is purely for this periodic 'update' process.The
SELECT
finds the 'new' rows and feeds them to theINSERT
to do the rest add new rows.The
UPDATE
is to update all the rows for which you have new data, and to updateimported
.If
uid
isAUTO_INCREMENT
, I deliberately avoidedINSERT ... ON DUPLICATE KEY UPDATE ...
because the latter would 'burn' ids, thereby threatening to eventually overflow the range ofuid
.For more discussion, see my blog on ingestion . That section of the blog details how to augment a Normalization table with one
INSERT
and oneUPDATE
. Your case is very similar. The rest of the blog focuses on 'continual' ingestion, unlike your 'periodic' ingestion.If your incremental load is bigger than a million rows, there are other aspects we need to discuss.