I have an entity that has multiple attributes, and these attributes are themselves entities.
No two entities may have the same attributes, so there needs to be a uniqueness constraint across them. But at the same time, not all attributes are required.
The following DML illustrates this where attr_a
and attr_b
are not optional (do not allow NULL) whilst attr_c
and attr_d
are optional (do allow NULL):
CREATE TABLE attr_a (
attr_a_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE attr_b (
attr_b_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE attr_c (
attr_c_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE attr_d (
attr_d_id INTEGER(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT
) ENGINE InnoDB;
CREATE TABLE entity (
entity_id INT(10) UNSIGNED PRIMARY KEY AUTO_INCREMENT NOT NULL,
attr_a_id INT(10) UNSIGNED NOT NULL,
attr_b_id INT(10) UNSIGNED NOT NULL,
attr_c_id INT(10) UNSIGNED NULL,
attr_d_id INT(10) UNSIGNED NULL,
UNIQUE KEY uq_attr (attr_a_id, attr_b_id, attr_c_id, attr_d_id),
KEY `fk_attr_a_id` (attr_a_id),
KEY `fk_attr_b_id` (attr_b_id),
KEY `fk_attr_c_id` (attr_c_id),
KEY `fk_attr_d_id` (attr_d_id),
CONSTRAINT `fk_attr_a_id` FOREIGN KEY (attr_a_id) REFERENCES `attr_a` (attr_a_id),
CONSTRAINT `fk_attr_b_id` FOREIGN KEY (attr_b_id) REFERENCES `attr_b` (attr_b_id),
CONSTRAINT `fk_attr_c_id` FOREIGN KEY (attr_c_id) REFERENCES `attr_c` (attr_c_id),
CONSTRAINT `fk_attr_d_id` FOREIGN KEY (attr_d_id) REFERENCES `attr_d` (attr_d_id)
) ENGINE InnoDB;
So the intention is that these two rows from the entity
table should not be allowed:
1,10,20,30,NULL
2,10,20,30,NULL
The following pairs of rows should be allowed, as though nulls were actually values (I understand the BDB engine would allow this, but I will be using InnoDB.)
1,10,20,30,NULL
2,10,20,NULL,NULL
1,10,20,30,40
2,10,20,NULL,NULL
1,10,20,30,NULL
2,10,20,NULL,40
The tables described above is obviously flawed, because of how NULLs are handled with a uniqueness constraint — it would allow those two rows to exist.
In the case of an entity
, attr_c
and attr_d
are optional rather than unknown. So I think allowing NULLs is not technically correct because the value is known to be undefined, rather than unknown.
I think the only practical solution is to reserve a item in each of the attribute tables for "undefined" (possibly with ID 0.)
I know this solution would work, but I want to avoid the mistake of adopting a known anti-pattern.
Best Answer
There isn't an easy way (except in SQL Server, see below) to enforce these constraints. And I say "these" and not "this" because they are indeed more than one.
You want to enforce uniqueness on
(a,b,c,d)
when all the attributes are not null. And uniqueness on(a,b,c)
whend
is null. And uniqueness on(a,b,d)
whenc
is null. And uniqueness on(a,b)
when bothc
andd
are null.You say that
NULL
value onc
ord
means that the value is known to be undefined, yet you allow both(10, 20, 30, 40)
and(10, 20, NULL, NULL)
. Is the value defined or known to be undefined in this case?Anyway, to actually enforce these rules, there are various options:
If you want to keep the data in a single table and you implement in a DBMS that have filtered/partial indexes (PostgreSQL, SQL Server), you can create 4
UNIQUE
partial indexes:Some other DBMS (like Oracle and DB2) do not have partial indexes but they can be emulated, with different techniques (see the Use the Index Luke! website: DB2, "Emulating partial indexes is possible)
Specifically in SQL Server - and only in this DBMS, because it treats
NULL
values in regard toUNIQUE
constraints differently than all the rest and not according to the standard - we don't really need all the 4 constraints and we can actually get your desired behaviour with what you already have, a simpleUNIQUE
constraint on(a,b,c,d)
.It will do exactly what your rules want to enforce. This does not comply with the SQL standard and there is a small chance that the behaviour will change in future releases (unlikely I'd add, as DBMS try very hard for backwards compatibility) .
Another (similar to the partial unique index) option (kudos to @Erwin, see his answer) is to use a functional index. This could be used in PostgreSQL, Oracle and DB2 that have such indexes.
A variation is to use computed columns (for attributes
c
andd
) and add a unique index based on the four columns(a, b, coalesced_c, coalesced_d)
. This feature is available in SQL Server but also in MariaDB 5.3+ (which is a MySQL variant) and in the latest MySQL version (5.7), too.In older MySQL versions (which seems like the DBMS you use) and other ones (like SQLite) that do not have partial or functional indexes, if you want to keep only one table, I think the only option is to do what you considered:
Use a value (like
0
or-1
that can't appear in your data) as a substitute for "known to be undefined".This would of course work in any DBMS.
And last, what you could also use in any DBMS (assuming it supports usual
FOREIGN KEY
andUNIQUE
constraints) is to properly normalize theentity
table into four ones, one for each case:then the
entity
table would become aVIEW
: