How to model a many-to-many relationship on two different columns

database-design

I am trying to understand how to model a certain type of relationship.

I have a dataset that I would like to put into a database, however, I have this relationship among two columns in my data.

Here is a sample of what I am looking at:

Company_ID1  Company_ID2       
      9581091  53506312
      9581091  961273620
     79735371  53506312
     79735371  79735371
     79735371  135962137
     46667523  9122532   
     55189732  9122532   
     71453880  9122532   
     77817617  9122532   
     77817617  79834910  
     79871820  9122532   
     79871820  79834910  
     98158277  9122532   
     98158277  458182615 
    134303192  9122532   
    187502299  458182615

As you can see company_id2=53506312 is associated with two different company_id1's. This is just one example, but there is a many-to-many relationship between the two company_Id's. By inspection, this example is the same company.

How would I model this situation so that the database finds this kind of association and is able to group them all together?

The motivation behind this is finding sales and contract by a company. So grouping by either company_id will result in the wrong answer.

How I have solved this with text files, is I have a graph where all the nodes are the unique company_id's and the edges is the tuple (company_id1, company_id2). Then I find all disconnected subgraphs in the graph and each ones of those is a company.

So I have something like this:

Company_ID1  Company_ID2      company_number
          9581091  53506312         1
          9581091  961273620        1
         79735371  53506312         1
         79735371  79735371         1
         79735371  135962137        1
         46667523  9122532          2
        55189732   9122532          2
        71453880   9122532          2
        77817617   9122532          2
        77817617   79834910         2
        79871820   9122532          2
        79871820   79834910         2
        98158277   9122532          2
        98158277   458182615        2
        134303192  9122532          2
        187502299  458182615        2

So then I groupby company and I can do summary reports.

But I would like to now move to a database and have the database do this calculation for me. So, two questions. What kind of relationship is this called, and how do I model it in my database?

One last think, these two columns company_id1 and company_id2 aren't the primary keys.

Best Answer

You have asked: What kind of relationship is this called, and how do I model it in my database?

Your scenario seems to contain a recursive many-to-many relationship: EACH company1 may have (optional) ONE OR MORE associations with another company2. Each company2 may have one or more associations with a company1. ("company1" and "company2" are both: companies. I have only given them these names so that the relationship's "directions" become obvious) The traditional way of "resolving" a many-to-many relationship is: creating an intersection/junction.

ERD (entity relationship diagram)

Intersection in relational model:

Doing a "fast forward" to the actual implementation, maybe the following will be useful for you:

1 - for recording all NODES in your existing model: have one table for recording unique company ids, use PK constraint.

2 - for all EDGES in your existing model: have another table for recording VALID combinations. Use 2 foreign key constraints (only existing company ids are allowed), and a PK constraint that guarantees the uniqueness of the recorded combinations. (You can also use a CHECK constraint, in order to avoid associations of a company with itself. However, your test data values , line 4: 79735371 79735371, suggest that a node can be associated with itself.)

Example implementation (Oracle 12c):

-- ~ nodes in graph, uniqueness through pk constraint ~
create table companies(
  id number primary key
);

-- insert unique values ~ nodes ~
begin
  insert into companies (id) values (9581091);
  insert into companies (id) values (79735371);
  insert into companies (id) values (53506312);
  insert into companies (id) values (961273620);
  insert into companies (id) values (135962137);
end;

--- ~ edges in graph ~
create table associations(
  co1 number references companies(id)
, co2 number references companies(id)
);

alter table associations
add constraint apk
primary key (co1, co2);

testing

-- use your test data 
begin
  insert into associations values (9581091, 53506312);
  insert into associations values (9581091, 961273620);
  insert into associations values (79735371, 53506312);
  insert into associations values (79735371, 79735371);
  insert into associations values (79735371, 135962137);
end;

-- Optional: testing the intersection/junction by inserting all possible combinations:

insert into associations
select 
  c1.id
, c2.id
from companies c1, companies c2;

Related Solutions

Mysql – How to model (a many to many) to many relationship

The tables student and major are good.

However following improvements can/should be made.

Merge the FKs in MajorStudent to a PK, getting rid of the id
Save a StudentFK and MajorFK rather than a MajorStudentFK in Exam, well one can discuss about this though but it'll be faster and in my consideration NOT redundant
Grade is not normalized. All attributes are supposed to be atomic and grade is not. Either make 4 columns (bad) or use the following

Exam(id[PK], studentId[FK], MajorId[FK]) Grades(id[PK], examId[FK], value)

I don't understand why one exam comes with four grades though.

Data inconsistency prohibition if a table refers to another via two many-to-many relationships

You are right, the design allows inconsistencies, exactly what you notice. A ProjectRealm may be referring through Project to a company and through CompanyRealm to another company.

This is not uncommon, it appears when there is a triangular or a "diamond" shape in the relationships:

Realm          Company
   \           /     \
    \         /       \
     \       /         \
    CompanyRealm     Project
             \         /
              \       /  
               \     /    
             ProjectRealm

The common solution, if you want to enforce this via DRI (Declarative Referential Integrity), ie. through usual UNIQUE, FOREIGN KEY and CHECK constraints - and not though triggers - is to use composite keys for the ProjectRealm REFERENCES CompanyRealm and for the ProjectRealm REFERENCES Project foreign keys.

You will need to replace the company_realm_id with 2 columns (company_id, realm_id) and add a composite UNIQUE constraint on each of the two referenced tables (or modify their PRIMARY KEY):

Company [id]

Realm [id]

CompanyRealm [id, realm_id=>Realm.id, company_id=>Company.id,
              UNIQUE(company_id, realm_id)]

Project [id, company_id=>Company.id,
         UNIQUE(company_id, id)]

ProjectRealm [id, project_id, company_id, realm_id,
              (company_id, project_id) => Project (company_id, id),
              (company_id, realm_id) => CompanyRealm (company_id, realm_id)]

See also several other questions, where the same (or quite similar) problem appears. The diamond pattern is obvious in all:

Best data modelling approach to handle redundant foreign keys in relational model
and the nice answer and logical models and diagrams by MDCCL.
Foreign Key Constraint over two tables
How to represent it with UNIQUE indexes and foreign keys?
Many to Many and Weak Entities
Constrains relationship

Best Answer

Related Solutions

Mysql – How to model (a many to many) to many relationship

Data inconsistency prohibition if a table refers to another via two many-to-many relationships

Related Question