Mysql – Design for archiving a sorted list (with ability for a few INSERT)

archivedatabase-designMySQL

I have a somewhat static sorted list of words which I want to save in a database table:

-- simplified!
-- Note: the rank/order of each item is NOT calculable from other values!
CREATE table words (
  id INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  word NVARCHAR(50),
  -- foreign key
  book_id INTEGER UNSIGNED
);

The order of words in a 'book' have to be retained.

But what is the best way to store the order of all datasets?

I've already read How to design a database for storing a sorted list? on this site. But I don't have many INSERTs (if I will ever have one) between two records and I don't expect my database to grow up to more than 100k.

So I thought I could use the primary key column id for storing the index in the list. But what if I have to insert one record between two ids?

The other possibilty is to add another column. Is it better to store the numerical position of the dataset or the neighbour IDs in there?

I'm using MySQL.

Example:

I have this list from an external resource:

house
dog
browser
database

Now I have to enter these values in same order into the database!

INSERT INTO words (word) VALUES ('house'), ('dog'), ('browser'), ('database')

The order is now described via the id column (which is the primary key at the same time).

But suddenly, I have to insert another word between 'house' and 'dog'.
I can't simply change the PK id because that would break other table relations.

Best Answer

By all means use an unsigned int surrogate key as your primary, clustered index. However, instead of using sequential values, build some padding into the sequence. This means that you'll have to assign the id manually instead of using auto_increment.

If you use unsigned int in MySQL, the max value is 4,294,967,295. If you expect to have at most 100,000 rows that means you could space each word out by more than 42,000.

When you need to insert a word between two existing words, just plug it into the space half way between. Let's say you use 40,000 as your intitial padding value. If you have "house" at 800,000 and "dog" at 840,000 you can insert "nouveau" at 820,000.

Related Solutions

Product Attribute List Design Pattern in MySQL

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Database Design – Structuring Excel-like Data

Here is one approach:

CREATE TABLE columns (
    id       INTEGER, 
    ordinal  INTEGER);

CREATE TABLE rows (
    id       INTEGER, 
    ordinal  INTEGER);

CREATE TABLE cells (
    rowid     INTEGER, 
    columnid  INTEGER, 
    value     TEXT);

This way, you will still need to +1/-1 the ordinals behind the new/deleted position, but fortunately you can do it all with a single statement, UPDATE rows SET ordinal = ordinal + 1 WHERE ordinal > 42. Although it updates many rows, the update statement should execute in less than a second.

Benefits:

Updating a rows table with N rows is lighter than updating a cells table with N*M rows.
Explicitly storing the ordinal means fast random access to an ordered subset of the data.

Best Answer

Related Solutions

Product Attribute List Design Pattern in MySQL

Database Design – Structuring Excel-like Data

Related Question