How to model a many-to-many relationship on two different columns

database-design

I am trying to understand how to model a certain type of relationship.

I have a dataset that I would like to put into a database, however, I have this relationship among two columns in my data.

Here is a sample of what I am looking at:

Company_ID1  Company_ID2       
      9581091  53506312
      9581091  961273620
     79735371  53506312
     79735371  79735371
     79735371  135962137
     46667523  9122532   
     55189732  9122532   
     71453880  9122532   
     77817617  9122532   
     77817617  79834910  
     79871820  9122532   
     79871820  79834910  
     98158277  9122532   
     98158277  458182615 
    134303192  9122532   
    187502299  458182615

As you can see company_id2=53506312 is associated with two different company_id1's. This is just one example, but there is a many-to-many relationship between the two company_Id's. By inspection, this example is the same company.

How would I model this situation so that the database finds this kind of association and is able to group them all together?

The motivation behind this is finding sales and contract by a company. So grouping by either company_id will result in the wrong answer.

How I have solved this with text files, is I have a graph where all the nodes are the unique company_id's and the edges is the tuple (company_id1, company_id2). Then I find all disconnected subgraphs in the graph and each ones of those is a company.

So I have something like this:

Company_ID1  Company_ID2      company_number
          9581091  53506312         1
          9581091  961273620        1
         79735371  53506312         1
         79735371  79735371         1
         79735371  135962137        1
         46667523  9122532          2
        55189732   9122532          2
        71453880   9122532          2
        77817617   9122532          2
        77817617   79834910         2
        79871820   9122532          2
        79871820   79834910         2
        98158277   9122532          2
        98158277   458182615        2
        134303192  9122532          2
        187502299  458182615        2

So then I groupby company and I can do summary reports.

But I would like to now move to a database and have the database do this calculation for me. So, two questions. What kind of relationship is this called, and how do I model it in my database?

One last think, these two columns company_id1 and company_id2 aren't the primary keys.

Best Answer

You have asked: What kind of relationship is this called, and how do I model it in my database?

Your scenario seems to contain a recursive many-to-many relationship: EACH company1 may have (optional) ONE OR MORE associations with another company2. Each company2 may have one or more associations with a company1. ("company1" and "company2" are both: companies. I have only given them these names so that the relationship's "directions" become obvious) The traditional way of "resolving" a many-to-many relationship is: creating an intersection/junction.

ERD (entity relationship diagram) ERD (entity relationship diagram)

Intersection in relational model: Relational model

Doing a "fast forward" to the actual implementation, maybe the following will be useful for you:

1 - for recording all NODES in your existing model: have one table for recording unique company ids, use PK constraint.

2 - for all EDGES in your existing model: have another table for recording VALID combinations. Use 2 foreign key constraints (only existing company ids are allowed), and a PK constraint that guarantees the uniqueness of the recorded combinations. (You can also use a CHECK constraint, in order to avoid associations of a company with itself. However, your test data values , line 4: 79735371 79735371, suggest that a node can be associated with itself.)

Example implementation (Oracle 12c):

-- ~ nodes in graph, uniqueness through pk constraint ~
create table companies(
  id number primary key
);

-- insert unique values ~ nodes ~
begin
  insert into companies (id) values (9581091);
  insert into companies (id) values (79735371);
  insert into companies (id) values (53506312);
  insert into companies (id) values (961273620);
  insert into companies (id) values (135962137);
end;

--- ~ edges in graph ~
create table associations(
  co1 number references companies(id)
, co2 number references companies(id)
);

alter table associations
add constraint apk
primary key (co1, co2);

testing

-- use your test data 
begin
  insert into associations values (9581091, 53506312);
  insert into associations values (9581091, 961273620);
  insert into associations values (79735371, 53506312);
  insert into associations values (79735371, 79735371);
  insert into associations values (79735371, 135962137);
end;

-- Optional: testing the intersection/junction by inserting all possible combinations:

insert into associations
select 
  c1.id
, c2.id
from companies c1, companies c2;