Database Design – Structuring Excel-like Data

database-design

The task

I need to store the some values in sequence in the database. The data looks just like a sequence of numbers. Here is an example of the data:

            Val_1   Val_2   Val_3   ...     Val_145
Record 1        6       8       4                 5
Record 2        2       5       6                 3
etc.

Now the thing is that there are a couple of actions that the user has to be able to do:

Insert columns (== e.g. the user needs to enter additional data in between Val_1 and Val_2 columns and inserts 5 columns there). The column headers are sequential though (val_2 always follows val_1, val_3 always follows val_2) thus only the values get shifted to the right – just like in Excel when you insert columns.
Delete column – same as with insert columns – column header always stay the same and values get deleted and values from the right get shifted to the left – again just like in excel.
View random chunks of values. E.g. view values from index 600 to index 1000

The three solutions

I came up with three solutions that can store those values, but every solution has some flaws. Here they are:

Store the index of value in the Value

Tables:

table Record
    id int
table Value
    record_id int
    index int
    value int

Pros: easy to find the needed values by using the index field (where index > 662 and index < 987).
Cons: inserting and deleting value are horrible. If there are 1000 values for a record then inserting one value at index 500 will require 1 insert and 500 updates (to shift values after index 500). Same problem with deleting values.

Store references to previous value and the next value

Tables:

table Record
    id int
table Value
    record_id int
    value int
    next_value_id int
    prev_value_id int

Pros: easy to insert and delete values – inserting a value requires 1 insert and 2 updates (updating refs on adjacent values). We can even remove the prev_value_id to make it even better.
Cons: finding values by index will be horrible. Finding range of values from 500 to 1000 will require us to go though all the values starting at value 1.

Store values as e.g. string in the record table

Tables:

table Record
    id int
    values_str text

Pros: this is just like parsing a text file. Most work is done in code. Feels like a simple and straightforward way to go.
Cons: does not feel right. This may look and feel good, but something about this design smells bad. I have a bad feeling about this.

The question

By know you probably already guessed the question – which way should I go? Is there a more sophisticated way of storing such data? Does any of the solutions I've thought of make any sense?

Best Answer

Here is one approach:

CREATE TABLE columns (
    id       INTEGER, 
    ordinal  INTEGER);

CREATE TABLE rows (
    id       INTEGER, 
    ordinal  INTEGER);

CREATE TABLE cells (
    rowid     INTEGER, 
    columnid  INTEGER, 
    value     TEXT);

This way, you will still need to +1/-1 the ordinals behind the new/deleted position, but fortunately you can do it all with a single statement, UPDATE rows SET ordinal = ordinal + 1 WHERE ordinal > 42. Although it updates many rows, the update statement should execute in less than a second.

Benefits:

Updating a rows table with N rows is lighter than updating a cells table with N*M rows.
Explicitly storing the ordinal means fast random access to an ordered subset of the data.

Related Solutions

PostgreSQL – Best Practice for Storing Record Metadata

The columns you are talking about occupy 20 bytes (if aligned without padding):

creation time, update time and creation source

timestamp .. 8 bytes
timestamp .. 8 bytes
integer .. 4 bytes

The tuple header and item identifier for a separate row in a separate table alone would occupy 23 + 1 + 4 = 28 bytes plus the 20 bytes of actual data, plus 4 bytes of padding at the end. Makes 52 bytes per row. See:

Configuring PostgreSQL for read performance

Concerning storage you have nothing to gain. Concerning performance you hardly lose anything with just 16 - 24 bytes more per row.

The columns also directly belong to the row, so it makes sense to keep them together. I make it a habit to add exactly such columns (plus separate source for the last update) to all relevant tables.

It's also easier to write a TRIGGER ON INSERT OR UPDATE to keep them current.

Long story short: a strong vote for your option 1.

Where I would go for option 3:
If the metadata is updated often, while the core row is not. Then it might pay to keep a separate 1:1 table to make UPDATEs cheaper and reduce bloat on the main table - or even go for option 2.

Where I would go for option 2:
If the set of metadata columns is highly repetitive. You could have a FK column to the set of metadata in the main table(s). Does not save much for three small columns like in your example.

Database Design – Poor Man’s Referential Integrity Schema Design Pattern

Your design looks a bit like the "supertype/subtype" pattern. Search for that and for "table inheritance". It needs quite a lot of work to be able to enforce integrity constraints though.

You are missing a generic Fruit table (that's the "supertype") and a FruitType table to store the alllowed fruit types:

FruitType 
    fruit_type PK

Fruit
    fruit_type PK, FK -> FruitType (fruit_type)
    fruit_id   PK

Then the 3 (or 4 or more) tables would be (the "subtype" tables):

Apple
    fruit_type 
    fruit_id PK
    (fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
    CHECK (fruit_type = 'Apple')

Banana 
    fruit_type PK
    fruit_id PK
    (fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
    CHECK (fruit_type = 'Banana')

Orange
    fruit_type PK
    fruit_id PK
    (fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)
    CHECK (fruit_type = 'Orange')

And any other table can reference the Fruit table:

FruitPack 
    fruitpack_id PK 
    destination

FruitPackFruits 
    fruitpack_id FK -> FruitPack (fruitpack_id)
    fruit_id     
    fruit_type
    (fruit_type, fruit_id) FK -> Fruit (fruit_type, fruit_id)

It doesn't look very nice and one column in every "fruit" table seems redundant as it has one and only one allowed value. And every time you need to add a new fruit (say Cherry), you have to add a row in the table FruitType and a new table (Cherry), similar to the other ones. So, it works better if your design is more or less stable. If you find that you may need to add a new "fruit" every few days or if you have a thousand (or more!) different fruits, it's not the best way.

On the other hand, it enforces integrity and you can't insert cherries into the Bananas or oranges into the Apples.

The task

The three solutions

Store the index of value in the Value

Store references to previous value and the next value

Store values as e.g. string in the record table

The question

Best Answer

Related Solutions

PostgreSQL – Best Practice for Storing Record Metadata

Database Design – Poor Man’s Referential Integrity Schema Design Pattern

Related Question