What’s the best practice for representing set operations in a relational database

database-design

I have a simple database representing collections of entities and sets of those collections. Sets contain multiple collections and collections contain multiple entities. Currently, when I query all the entities belonging to a given set I'm returning a union of all the entities in each collection belonging to that set. How could I redesign the schema and queries to allow for more flexible set logic? For instance given 4 collections, when I select all the entities in set 1, I want to get (collection 1 INTERSECT collection 2 EXCEPT collection 3) and when I query set 2 I want to get (collection 3 UNION (collection 2 INTERSECT collection 4)). In other words, I need a way to represent dynamic nested set operators in a relational database.

I can think of a few potential ways to accomplish this, suffice to say they are all non-trivial. What's the best practice for situations like this? Can anyone point towards any papers or articles on how to accomplish this? Thanks in advance.

enter image description here

Best Answer

it sounds to me like you are trying to store in the database what are essentially conditions on which set operations are performed. This would seem to look more and more like an extended catalog for an in-db orm. The problem isn't just storing the set operations but also when to use them. I would do it like this (essentially storing this as a sort of graph):

CREATE TABLE set_catalog_class (
    id int not null unique,
    class_name text primary key
);

CREATE TABLE set_catalog_operation (
    id int not null unique,
    op_name text primary key
);

CREATE TABLE set_catalog_class_op (
    class_id int not null references set_catalog_class(id),
    op_id int not null references set_catalog_operation(id),
    relation_name text not null,
    order_num int not null,
    primary key (class_id, order_num)
);

This essentially gives you a very simple graph between nodes (operations) and ties them to relations which could then be views or the like.