Another way (without Nulls and without cycles in the FOREIGN KEY
relationships) is to have a third table to store the "favourite children". In most DBMS, you'll need an additional UNIQUE
constraint on TableB
.
@Aaron was faster to identify that the naming convention above is rather cumbersome and can lead to errors. It's usually better (and will keep you sane) if you don't have Id
columns all over your tables and if the columns (that are joined) have same names in the many tables that appear. So, here's a renaming:
Parent
ParentID INT NOT NULL PRIMARY KEY
Child
ChildID INT NOT NULL PRIMARY KEY
ParentID INT NOT NULL FOREIGN KEY REFERENCES Parent (ParentID)
UNIQUE (ParentID, ChildID)
FavoriteChild
ParentID INT NOT NULL PRIMARY KEY
ChildID INT NOT NULL
FOREIGN KEY (ParentID, ChildID)
REFERENCES Child (ParentID, ChildID)
In SQL-Server (that you are using), you also have the option of the IsFavorite
bit column you mention. The unique favourite child per parent can be accomplished via a filtered Unique Index:
Parent
ParentID INT NOT NULL PRIMARY KEY
Child
ChildID INT NOT NULL PRIMARY KEY
ParentID INT NOT NULL FOREIGN KEY REFERENCES Parent (ParentID)
IsFavorite BIT NOT NULL
CREATE UNIQUE INDEX is_FavoriteChild
ON Child (ParentID)
WHERE IsFavorite = 1 ;
And the main reason that your option 1 is not recommended, at least not in SQL-Server, is that the pattern of circular paths in the foreign key references has some problems.
Read a quite old article: SQL By Design: The Circular Reference
When inserting or deleting rows from the two table, you'll run into the "chicken-and-egg" problem. Which table should I insert first - without violating any constraint?
In order to solve that, you have to define at least one column nullable. (OK, technically you don't have to, you can have all columns as NOT NULL
but only in DBMS, like Postgres and Oracle, that have implemented deferrable constraints. See @Erwin's answer in a similar question: Complex foreign key constraint in SQLAlchemy on how this can be done in Postgres). Still, this setup feels like skating on thin ice.
Check also an almost identical question at SO (but for MySQL) In SQL, is it OK for two tables to refer to each other? where my answer is pretty much the same. MySQL has no partial indexes though, so the only viable options are the nullable FK and the extra table solution.
Option 1
*This doesn't seem like a great idea to me, because it complicates the SQL to select all properties applied to a feature…
It does not necessarily complicate query SQL (see conclusion below).
…and doesn't readily scale to more conditions…
It scales readily to more conditions, as long as there are still a fixed number of conditions, and there aren't dozens or hundreds.
However, it does enforce the requirement of a certain number of conditions per (X,Y) pair. In fact, it is the only option here that does so.*
It does, and although you say in a comment that this is "the least important of my requirements", you haven't said it doesn't matter at all.
Option 2
One downside to this is that it doesn't specify the number of conditions for each pair. Another is that when I am only considering the initial relationship…I then have to add a DISTINCT clause to avoid duplicate entries…
I think you can dismiss this option because of the complications you mention. The objx_objy
table is likely to be the driving table for some of your queries (eg "select all properties applied to a feature", which I am taking to mean all properties applied to an objx
or objy
). You can use a view to pre-apply the DISTINCT
so it is not a matter of complicating queries, but that's going to scale very badly performance-wise for very little gain.
Option 3
Does it make sense though to create a new ID that identifies nothing other than existing IDs?
No, it doesn't — Option 4 is better in every regard.
Option 4
…it basically duplicates an entire table multiple times (or feels that way, anyway) so also doesn't seem ideal.
This option is just fine — it is the obvious way of setting up the relations if the number of properties is variable or subject to change
Conclusion
My preference would be option 1 if the number of properties per objx_objy
is likely to be stable, and if you can't imagine ever adding more than a handful extra. It is also the only option that enforces the 'number of properties = 3' constraint — enforcing a similar constraint on option 4 would likely involve adding c1_p_id
… columns to the xy table anyway*.
If you really don't care much about that condition, and you also have reason to doubt that the number of properties condition is going to be stable then choose option 4.
If you aren't sure which, choose option 1 — it is simpler and that is definitely better if you have the option, as others have said. If you are put off option 1 "…because it complicates the SQL to select all properties applied to a feature…" I suggest the creating a view to provide the same data as the extra table in option 4:
option 1 tables:
create table prop(id integer primary key);
create table objx(id integer primary key);
create table objy(id integer primary key);
create table objx_objy(
x_id integer references objx
, y_id integer references objy
, c1_p_id integer not null references prop
, c2_p_id integer not null references prop
, c3_p_id integer not null references prop
, primary key (x_id, y_id)
);
insert into prop(id) select generate_series(90,99);
insert into objx(id) select generate_series(10,12);
insert into objy(id) select generate_series(20,22);
insert into objx_objy(x_id,y_id,c1_p_id,c2_p_id,c3_p_id)
select objx.id, objy.id, 90, 91, 90+floor(random()*10)
from objx cross join objy;
view to 'emulate' option 4:
create view objx_objy_prop as
select x_id
, y_id
, unnest(array[1,2,3]) c_id
, unnest(array[c1_p_id,c2_p_id,c3_p_id]) p_id
from objx_objy;
"select all properties applied to a feature":
select distinct p_id from objx_objy_prop where x_id=10 order by p_id;
/*
|p_id|
|---:|
| 90|
| 91|
| 97|
| 98|
*/
dbfiddle here
Best Answer
I would not model a one->many relationship as a many<->many one, with a junction table as you describe, because you are allowing more in your data model than the system it is modelling. You will need to add extra constraints, either in the DB if possible or other application layers, to enforce the more limited relationship and/or add code to detect and deal with breaches of this constraint that happen due to bugs.
Is there a particular reason you want to disallow NULLs for a value that seems to naturally call for them?
If so then you could create "partial junction table" or "extension table" (I'm making that terminology up, I don't know off the top of my head if this is a pattern that has a commonly accepted name):
this way the FK ensures that you have a one->many relationship as each ID from TableB can only appear once while rows from table A can be referred to by many entities in TableB. The empty condition then becomes that the row does not exist. This avoids the NULL needing to be "stored" but you now have an extra table to join in every time you need to use this field.
If you want to have a surrogate key as your PK, as in the example in the question, then the definition becomes:
though that would be wasteful IMO unless you think that you might in later revisions change the relationship to be many<->many where it is valid for the same two entities can relate to each other more than once.