Mysql – Design for archiving a sorted list (with ability for a few INSERT)

archivedatabase-designMySQL

I have a somewhat static sorted list of words which I want to save in a database table:

-- simplified!
-- Note: the rank/order of each item is NOT calculable from other values!
CREATE table words (
  id INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  word NVARCHAR(50),
  -- foreign key
  book_id INTEGER UNSIGNED
);

The order of words in a 'book' have to be retained.

But what is the best way to store the order of all datasets?

I've already read How to design a database for storing a sorted list? on this site. But I don't have many INSERTs (if I will ever have one) between two records and I don't expect my database to grow up to more than 100k.

So I thought I could use the primary key column id for storing the index in the list. But what if I have to insert one record between two ids?

The other possibilty is to add another column. Is it better to store the numerical position of the dataset or the neighbour IDs in there?

I'm using MySQL.


Example:

I have this list from an external resource:

  1. house
  2. dog
  3. browser
  4. database

Now I have to enter these values in same order into the database!

INSERT INTO words (word) VALUES ('house'), ('dog'), ('browser'), ('database')

The order is now described via the id column (which is the primary key at the same time).

But suddenly, I have to insert another word between 'house' and 'dog'.
I can't simply change the PK id because that would break other table relations.

Best Answer

By all means use an unsigned int surrogate key as your primary, clustered index. However, instead of using sequential values, build some padding into the sequence. This means that you'll have to assign the id manually instead of using auto_increment.

If you use unsigned int in MySQL, the max value is 4,294,967,295. If you expect to have at most 100,000 rows that means you could space each word out by more than 42,000.

When you need to insert a word between two existing words, just plug it into the space half way between. Let's say you use 40,000 as your intitial padding value. If you have "house" at 800,000 and "dog" at 840,000 you can insert "nouveau" at 820,000.