Enforce uniqueness across optional attributes for an entity

database-designforeign keynullunique-constraint

I have an entity that has multiple attributes, and these attributes are themselves entities.

No two entities may have the same attributes, so there needs to be a uniqueness constraint across them. But at the same time, not all attributes are required.

The following DML illustrates this where attr_a and attr_b are not optional (do not allow NULL) whilst attr_c and attr_d are optional (do allow NULL):

CREATE TABLE attr_a (
  attr_a_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE attr_b (
  attr_b_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE attr_c (
  attr_c_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE attr_d (
  attr_d_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;

CREATE TABLE entity (
  entity_id INT(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT NOT NULL,
  attr_a_id INT(10) UNSIGNED NOT NULL,
  attr_b_id INT(10) UNSIGNED NOT NULL,
  attr_c_id INT(10) UNSIGNED NULL,
  attr_d_id INT(10) UNSIGNED NULL,
  UNIQUE KEY uq_attr (attr_a_id, attr_b_id, attr_c_id, attr_d_id),
  KEY `fk_attr_a_id` (attr_a_id),
  KEY `fk_attr_b_id` (attr_b_id),
  KEY `fk_attr_c_id` (attr_c_id),
  KEY `fk_attr_d_id` (attr_d_id),
  CONSTRAINT `fk_attr_a_id` FOREIGN KEY (attr_a_id) REFERENCES `attr_a` (attr_a_id),
  CONSTRAINT `fk_attr_b_id` FOREIGN KEY (attr_b_id) REFERENCES `attr_b` (attr_b_id),
  CONSTRAINT `fk_attr_c_id` FOREIGN KEY (attr_c_id) REFERENCES `attr_c` (attr_c_id),
  CONSTRAINT `fk_attr_d_id` FOREIGN KEY (attr_d_id) REFERENCES `attr_d` (attr_d_id)
) ENGINE InnoDB;

So the intention is that these two rows from the entity table should not be allowed:

1,10,20,30,NULL
2,10,20,30,NULL

The following pairs of rows should be allowed, as though nulls were actually values (I understand the BDB engine would allow this, but I will be using InnoDB.)

1,10,20,30,NULL
2,10,20,NULL,NULL

1,10,20,30,40
2,10,20,NULL,NULL

1,10,20,30,NULL
2,10,20,NULL,40

The tables described above is obviously flawed, because of how NULLs are handled with a uniqueness constraint — it would allow those two rows to exist.

In the case of an entity, attr_c and attr_d are optional rather than unknown. So I think allowing NULLs is not technically correct because the value is known to be undefined, rather than unknown.

I think the only practical solution is to reserve a item in each of the attribute tables for "undefined" (possibly with ID 0.)

I know this solution would work, but I want to avoid the mistake of adopting a known anti-pattern.

Best Answer

There isn't an easy way (except in SQL Server, see below) to enforce these constraints. And I say "these" and not "this" because they are indeed more than one.

You want to enforce uniqueness on (a,b,c,d) when all the attributes are not null. And uniqueness on (a,b,c) when d is null. And uniqueness on (a,b,d) when c is null. And uniqueness on (a,b) when both c and d are null.

You say that NULL value on c or d means that the value is known to be undefined, yet you allow both (10, 20, 30, 40) and (10, 20, NULL, NULL). Is the value defined or known to be undefined in this case?

Anyway, to actually enforce these rules, there are various options:

If you want to keep the data in a single table and you implement in a DBMS that have filtered/partial indexes (PostgreSQL, SQL Server), you can create 4 UNIQUE partial indexes:

UNIQUE (a, b, c, d) WHERE (c IS NOT NULL AND d IS NOT NULL)
UNIQUE (a, b, c)    WHERE (c IS NOT NULL AND d IS NULL)
UNIQUE (a, b, d)    WHERE (c IS NULL     AND d IS NOT NULL)
UNIQUE (a, b)       WHERE (c IS NULL     AND d IS NULL)

Some other DBMS (like Oracle and DB2) do not have partial indexes but they can be emulated, with different techniques (see the Use the Index Luke! website: DB2, "Emulating partial indexes is possible)

Specifically in SQL Server - and only in this DBMS, because it treats NULL values in regard to UNIQUE constraints differently than all the rest and not according to the standard - we don't really need all the 4 constraints and we can actually get your desired behaviour with what you already have, a simple UNIQUE constraint on (a,b,c,d).

It will do exactly what your rules want to enforce. This does not comply with the SQL standard and there is a small chance that the behaviour will change in future releases (unlikely I'd add, as DBMS try very hard for backwards compatibility) .

Another (similar to the partial unique index) option (kudos to @Erwin, see his answer) is to use a functional index. This could be used in PostgreSQL, Oracle and DB2 that have such indexes.

A variation is to use computed columns (for attributes c and d) and add a unique index based on the four columns (a, b, coalesced_c, coalesced_d). This feature is available in SQL Server but also in MariaDB 5.3+ (which is a MySQL variant) and in the latest MySQL version (5.7), too.

In older MySQL versions (which seems like the DBMS you use) and other ones (like SQLite) that do not have partial or functional indexes, if you want to keep only one table, I think the only option is to do what you considered:

Use a value (like 0 or -1 that can't appear in your data) as a substitute for "known to be undefined".

This would of course work in any DBMS.

And last, what you could also use in any DBMS (assuming it supports usual FOREIGN KEY and UNIQUE constraints) is to properly normalize the entity table into four ones, one for each case:

CREATE TABLE entity_ab (
  attr_a_id INT NOT NULL,
  attr_b_id INT NOT NULL,
  CONSTRAINT uq_attr_ab UNIQUE (attr_a_id, attr_b_id),
  CONSTRAINT fk_ab_attr_a_id FOREIGN KEY (attr_a_id) REFERENCES attr_a (attr_a_id),
  CONSTRAINT fk_ab_attr_b_id FOREIGN KEY (attr_b_id) REFERENCES attr_b (attr_b_id)
) ;

CREATE TABLE entity_abc (
  attr_a_id INT NOT NULL,
  attr_b_id INT NOT NULL,
  attr_c_id INT NOT NULL,
  CONSTRAINT uq_attr_abc UNIQUE (attr_a_id, attr_b_id, attr_c_id),
  CONSTRAINT fk_abc_attr_a_id FOREIGN KEY (attr_a_id) REFERENCES attr_a (attr_a_id),
  CONSTRAINT fk_abc_attr_b_id FOREIGN KEY (attr_b_id) REFERENCES attr_b (attr_b_id),
  CONSTRAINT fk_abc_attr_c_id FOREIGN KEY (attr_c_id) REFERENCES attr_c (attr_c_id)
) ;

CREATE TABLE entity_abd (
  attr_a_id INT NOT NULL,
  attr_b_id INT NOT NULL,
  attr_d_id INT NOT NULL,
  CONSTRAINT uq_attr_abd UNIQUE (attr_a_id, attr_b_id, attr_d_id),
  CONSTRAINT fk_abd_attr_a_id FOREIGN KEY (attr_a_id) REFERENCES attr_a (attr_a_id),
  CONSTRAINT fk_abd_attr_b_id FOREIGN KEY (attr_b_id) REFERENCES attr_b (attr_b_id),
  CONSTRAINT fk_abd_attr_d_id FOREIGN KEY (attr_d_id) REFERENCES attr_d (attr_d_id)
) ;

CREATE TABLE entity_abcd (
  attr_a_id INT NOT NULL,
  attr_b_id INT NOT NULL,
  attr_c_id INT NOT NULL,
  attr_d_id INT NOT NULL,
  CONSTRAINT uq_attr_abcd UNIQUE (attr_a_id, attr_b_id, attr_c_id, attr_d_id),
  CONSTRAINT fk_abcd_attr_a_id FOREIGN KEY (attr_a_id) REFERENCES attr_a (attr_a_id),
  CONSTRAINT fk_abcd_attr_b_id FOREIGN KEY (attr_b_id) REFERENCES attr_b (attr_b_id),
  CONSTRAINT fk_abcd_attr_c_id FOREIGN KEY (attr_c_id) REFERENCES attr_c (attr_c_id),
  CONSTRAINT fk_abcd_attr_d_id FOREIGN KEY (attr_d_id) REFERENCES attr_d (attr_d_id)
) ;

then the entity table would become a VIEW:

CREATE VIEW entity AS
        SELECT attr_a_id, attr_b_id, attr_c_id, attr_d_id
        FROM entity_abcd
    UNION ALL 
        SELECT attr_a_id, attr_b_id, attr_c_id, NULL
        FROM entity_abc
    UNION ALL 
        SELECT attr_a_id, attr_b_id, NULL, attr_d_id
        FROM entity_abd
    UNION ALL 
        SELECT attr_a_id, attr_b_id, NULL, NULL
        FROM entity_ab ;

Best Answer

Related Solutions

Sql-server – With MS SQL Server are the generated constraint names predictable

Mysql – Should I create a multi-column UNIQUE index

Related Question