How to represent directed graph with multiple parents

database-designhierarchy

http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html provides an algorithm for inserting and deleting from a Closure Table.

I'd like to model a similar data structure, except that nodes may have multiple parents.

Given:

Graph #1

If we remove [B, C] I expect to end up with:

Graph #2

and if we remove node B I expect to end up with:

Graph #3

However, if you use the author's algorithm for removing links or nodes you will notice that it tags [D, C, 1] for deletion, which is undesirable.

What I've tried so far

I have tried adapting the original data structure by adding a references column that indicates how many ways there are to travel between two nodes. In the above example, you can travel from A to C either through B or through D. The idea would have been that when B gets removed, the path from A to C gets kept and the reference count decreases from 2 to 1. It was nice in theory, but I couldn't figure out how to get the implementation working and now I wonder if it's at all possible (the data structure might not contain enough information to figure out which rows to remove).

What I'm asking

How would you adapt Closure Tables to support multiple parents? What alternative data structures would you recommend? https://stackoverflow.com/q/4048151/14731 contains an exaustive list of such data structures, but it's not clear which ones support (or are best for) multiple parents.

Best Answer

Usually create a nodes table and a relationships table. Directed graphs are't really hierarchical and they can have loops which makes querying harder. But if you think of a DAG as being a generalised tree (I.e. a tree that allows multiple parents but is still strictly hierarchical) and a directed graph as being a generalised DAG (i.e. like a DAG but not strictly hierarchical) things get easier.

So for a very simple, PostgreSQL solution we might do something like:

CREATE TABLE node (
    id serial primary key,
    payload jsonb not null
);

CREATE TABLE relationship (
    id serial primary key,
    relationship_type text not null,
    from_node int references node(id) not null,
    to_node int references node(id) not null,
    payload jsonb not null
);

Then you can query something like this:

with recursive dg as (
    select n.id as node_id, null::Int as parent, array[n.id] as path
      from node n
    union all
    select to_node, from_node, path || to_node
      FROM relationship
      JOIN dg on dg.node_id = from_node AND NOT from_node = ANY(path)
)
select * from dg;