How to enforce the structural constraints of rectangularly arrayed data

constraintdatabase-designschema

(I'm particularly interested in answers that view the question below as a special case of the question: how should a RDBMS enforce structural constraints that are more specific than "one-to-many" and "many-to-many"?)

Much experimental data in biomedical research is collected in "plates" of rectangularly arrayed "wells". These well array plates are commercially available a few standardized sizes: 2 × 3, 4 × 6, 8 × 12, 16 × 24, and 32 × 48.

Consider the following two alternatives for storing measurements from wells of 2 × 3 plates in an RDB:

-- alternative 1
CREATE TABLE measurement_foo (
    plate_id FOREIGN KEY REFERENCES plate(plate_id),
    plate_row CHAR(1),
    plate_column INTEGER,
    value REAL
);

-- alternative 2
CREATE TABLE measurement_foo (
    plate_id FOREIGN KEY REFERENCES plate(plate_id),
    a1 REAL,
    a2 REAL,
    a3 REAL,
    b1 REAL,
    b2 REAL,
    b3 REAL
);

My instinct is to go with alternative 1: it generalizes to plates of any size, and it can be modified in a straightforward way to record multiple different measurements per well, as in

CREATE TABLE measurement (
    plate_id FOREIGN KEY REFERENCES plate(plate_id),
    plate_row CHAR(1),
    plate_column INTEGER,
    foo FLOAT,
    bar FLOAT,
    baz FLOAT
);

In contrast, to adapt alternative 2 to the 16 × 24-well format (for example), would entail defining a table with 1 + 384 columns: plate_id, a01, …, a24, b01, … , b24, …, p01, …, p24. Moreover, a different such table would need to be defined for each type of measurement.

My main reason for even considering alternative 2 is that it alone enforces the constraint that each plate contains exactly 6 wells; alternative 1 does not.

(Also, one record of the table in alternative 2 would hold as much information as up to 6 records of the one in alternative 1, which makes me wonder whether the table in alternative 1 is fully normalized.)

For alternative 1, I can imagine ways to enforce the weaker constraint that a plate contains at most 6 wells, namely, by imposing a uniqueness constraint on (plate_id, plate_row, plate_column), and restricting columns plate_row and plate_column to the domains ('a', 'b') and (1, 2, 3), respectively. But it is still possible for wells to be missing from the data.

How should the database enforce the structural relationship between plates and wells?

Best Answer

All current RDBMS' tables can have CONSTRAINTS on columns. These constraints are checked every time data is inserted into the table. It can also check data against other tables.

We know that each Plate Type has certain number of Rows and Columns. We can enumerate all Rows and Columns for each Plate Type. So, when data is inserted, the DB can check if a certain row/column combination exists for a given Plate Type.

Lets create a set of tables:

create table Plate_Types (
Plate_Type_id int,
Plate_Size int,
Plate_row int,
Plate_col int)

This table holds description of every Plate size like this:

Id     Size   Row     Col
1       6       1       1    -- 2x3
1       6       1       2
...
1       6       2       3
5       1536    1       1    -- 32x48
...
5       1536    32      32

Then, in your main table from Alternative 1 we introduce a Foreign Key - a "link" to another table to check if row and column are valid for this Plate size.

create table MyTable (
well_id int,
plate_id int, 
plate_size int,
row_id int,
col_id int,
value real);

ALTER TABLE MyTable  
ADD CONSTRAINT FK_Plate_SizeCheck 
   FOREIGN KEY (Plate_size_id, Row_id, Column_id)
    REFERENCES Plate_Types (Plate_type_id, Plate_row, Plate_col);

This Constraint here does the following: for every inserted row DB goes to table Plate_Types and looks for combination of Plate_size_id, Plate_row and Plate_col. In other words, it checks if this Plate Size can have row I and column J. If there is no match, then the DB fires an error.

Please note that this is one of several possible solutions for data integrity enforcement for your example. Medical data often comes in huge volumes and performance of this particular design is different question.

PS. This is a shortened explanation for non-developers. Code as well as table design is for concept illustration only.