How to enforce the structural constraints of rectangularly arrayed data

constraintdatabase-designschema

(I'm particularly interested in answers that view the question below as a special case of the question: how should a RDBMS enforce structural constraints that are more specific than "one-to-many" and "many-to-many"?)

Much experimental data in biomedical research is collected in "plates" of rectangularly arrayed "wells". These well array plates are commercially available a few standardized sizes: 2 × 3, 4 × 6, 8 × 12, 16 × 24, and 32 × 48.

Consider the following two alternatives for storing measurements from wells of 2 × 3 plates in an RDB:

-- alternative 1
CREATE TABLE measurement_foo (
    plate_id FOREIGN KEY REFERENCES plate(plate_id),
    plate_row CHAR(1),
    plate_column INTEGER,
    value REAL
);

-- alternative 2
CREATE TABLE measurement_foo (
    plate_id FOREIGN KEY REFERENCES plate(plate_id),
    a1 REAL,
    a2 REAL,
    a3 REAL,
    b1 REAL,
    b2 REAL,
    b3 REAL
);

My instinct is to go with alternative 1: it generalizes to plates of any size, and it can be modified in a straightforward way to record multiple different measurements per well, as in

CREATE TABLE measurement (
    plate_id FOREIGN KEY REFERENCES plate(plate_id),
    plate_row CHAR(1),
    plate_column INTEGER,
    foo FLOAT,
    bar FLOAT,
    baz FLOAT
);

In contrast, to adapt alternative 2 to the 16 × 24-well format (for example), would entail defining a table with 1 + 384 columns: plate_id, a01, …, a24, b01, … , b24, …, p01, …, p24. Moreover, a different such table would need to be defined for each type of measurement.

My main reason for even considering alternative 2 is that it alone enforces the constraint that each plate contains exactly 6 wells; alternative 1 does not.

(Also, one record of the table in alternative 2 would hold as much information as up to 6 records of the one in alternative 1, which makes me wonder whether the table in alternative 1 is fully normalized.)

For alternative 1, I can imagine ways to enforce the weaker constraint that a plate contains at most 6 wells, namely, by imposing a uniqueness constraint on (plate_id, plate_row, plate_column), and restricting columns plate_row and plate_column to the domains ('a', 'b') and (1, 2, 3), respectively. But it is still possible for wells to be missing from the data.

How should the database enforce the structural relationship between plates and wells?

Best Answer

All current RDBMS' tables can have CONSTRAINTS on columns. These constraints are checked every time data is inserted into the table. It can also check data against other tables.

We know that each Plate Type has certain number of Rows and Columns. We can enumerate all Rows and Columns for each Plate Type. So, when data is inserted, the DB can check if a certain row/column combination exists for a given Plate Type.

Lets create a set of tables:

create table Plate_Types (
Plate_Type_id int,
Plate_Size int,
Plate_row int,
Plate_col int)

This table holds description of every Plate size like this:

Id     Size   Row     Col
1       6       1       1    -- 2x3
1       6       1       2
...
1       6       2       3
5       1536    1       1    -- 32x48
...
5       1536    32      32

Then, in your main table from Alternative 1 we introduce a Foreign Key - a "link" to another table to check if row and column are valid for this Plate size.

create table MyTable (
well_id int,
plate_id int, 
plate_size int,
row_id int,
col_id int,
value real);

ALTER TABLE MyTable  
ADD CONSTRAINT FK_Plate_SizeCheck 
   FOREIGN KEY (Plate_size_id, Row_id, Column_id)
    REFERENCES Plate_Types (Plate_type_id, Plate_row, Plate_col);

This Constraint here does the following: for every inserted row DB goes to table Plate_Types and looks for combination of Plate_size_id, Plate_row and Plate_col. In other words, it checks if this Plate Size can have row I and column J. If there is no match, then the DB fires an error.

Please note that this is one of several possible solutions for data integrity enforcement for your example. Medical data often comes in huge volumes and performance of this particular design is different question.

PS. This is a shortened explanation for non-developers. Code as well as table design is for concept illustration only.

Related Solutions

Enforce data integrity with a recursive table

Do not know if this will be useful to you because it requires quite a few changes, but the problem is interesting, so I'll try.

These would be the major changes

Using the tree closure instead of the adjacency list for the reference hierarchy. The closure table contains paths form each parent to all of it descendants, so all possible parent-child combinations are exposed.

Note that with the tree closure, each ancestor node points to itself as a descendant, meaning that in CaseProperty recursion stops on ID = ParentID instead on ParentID is NULL

It is not clear to me is a parent allowed to be any ancestor or just the one first step up. The closure table exposes ancestor and all descendants, so Level Difference is added to the TreeClosure, which is sub-typed as AllowedCombos for LevelDifference in (0,1).

Propagating AK {PropertyID, PropertyTypeID} instead of just PropertyID
Using composite key in CaseProperty

enter image description here

Here are main constraints from the model to clarify relationships (you may need to modify syntax)

ALTER TABLE Property ADD
  CONSTRAINT PK_PR  PRIMARY KEY (PropertyID)

, CONSTRAINT AK1_PR UNIQUE (PropertyID ,PropertyTypeID)

, CONSTRAINT FK1_PR FOREIGN KEY (PropertyTypeID)
         REFERENCES PropertyType(PropertyTypeID)
;



ALTER TABLE TreeClosure ADD
  CONSTRAINT PK_TC PRIMARY KEY (AncestorID ,DescendantID ,AncestorTypeID ,DescendantTypeID)

, CONSTRAINT FK1_TC FOREIGN KEY (AncestorID ,AncestorTypeID) 
             REFERENCES Property(PropertyID ,PropertyTypeID)

, CONSTRAINT FK2_TC FOREIGN KEY (DescendantID ,DescendantTypeID)
             REFERENCES Property(PropertyID   ,PropertyTypeID)
;



ALTER TABLE CaseProperty ADD
  CONSTRAINT PK_CP PRIMARY KEY (CaseID, PropertyID, PropertyTypeID)

, CONSTRAINT FK1_CP FOREIGN KEY (CaseID)
                 REFERENCES Case(CaseID)

, CONSTRAINT FK2_CP FOREIGN KEY (PropertyID ,PropertyTypeID)
             REFERENCES Property(PropertyID ,PropertyTypeID)

, CONSTRAINT FK4_CP FOREIGN KEY (ParentCaseID ,ParentPropertyID ,ParentPropertyTypeID)
         REFERENCES CaseProperty(CaseID       ,PropertyID       ,PropertyTypeID)

, CONSTRAINT FK5_CP FOREIGN KEY (ParentPropertyID ,PropertyID   , ParentPropertyTypeID ,PropertyTypeID) 
        REFERENCES AllowedCombos(AncestorID       ,DescendantID , AncestorTypeID       ,DescendantTypeID)

;

Postgresql – Need for indexes on foreign keys

Your design is good. If you are having a performance problem (which you can't know at design time), you should create an index on column table1.ref_field, in the same order (ASC) as the table2.id column. This will improve performance on joins between those to tables/columns. There is overhead to maintaining any index, so you want to weigh that cost against the benefit of improved performance.

PostgreSQL doesn't automatically create such indexes on foreign key columns that reference other columns, unfortunately, so you have to do it yourself.

Here is a StackOverflow question on the same topic:

Postgres and Indexes on Foreign Keys and Primary Keys

Here is a query to help determine where you might benefit from adding such an index:

Postgresql: Indexes on Foreign Keys

Best Answer

Related Solutions

Enforce data integrity with a recursive table

Postgresql – Need for indexes on foreign keys

Related Question