Database Design – Best Way to Link Different Entities to a Single Table

database-designsubtypes

I'm running for office and have created a web app for tracking my door knocks to voters at their homes. The database contains a table called voters that contains all the necessary information about voters in my community. I'm using mysql.

I'd like to add a new feature to track donors to my campaign. Not all of these donors live in the community and do not vote in the district where I'm running for office. I do not need to track the same kind of info for these individuals as I do for the voters so I'm going to place these individuals into a table called nonvoters.

Now, the individuals in my voters table can also make donations, and I want to track those as well.

Finally, I'd also like to track donations from organizations, so I also need an organizations table for storing information about the organizations.

To track the donations from both voters, nonvoters and organizations, I want to set up a new table called donations. This table would contain the appropriate details about each donation.

But I'm uncertain as to what the best structure is for linking the donations table to the voters, nonvoters, and organizations table should be. If I create a column called donor_id in the table to key it to the donors information, there's no way to know which table that ID refers to. So do I set up three columns, nonvoter_id, voter_id and org_id and insert the ID into the applicable column depending on whether the donor is a voter? This seems weird.

Or maybe I create a new column in each of the three tables called donor_id that I can use to link my data to the donations table. If I went this route, it seems like I'd have to do a some behind the scenes work to ensure the donor_id was unique and was keyed to the data inside the donations table.

Or maybe there are other approaches I'm not familiar with. Any guidance is appreciated.

Best Answer

Good question, you have provided a quite engaging context and analysis.

According to my understanding of your scenario description, some important aspects can be handled by means of (Super)Type-Subtype relationships. In this regard, this post provides relevant information.

Assumed business rules

As per your current specifications, the following assertions appear valid:

  • A Party[1] is either a Person or an Organization.
  • A Party grants zero-one-or-many Donations.
  • A Person can be a Voter and a NonVoter.

Preliminary logical data model

Therefore, I have derived the IDEF1X[2] logical data model presented in Figure 1.

Fig. 1. Voting and Donations Preliminary Data Model

Party and Donation

As you can see, the incorporation of the Party entity as an exclusive supertype provides the possibility for a Person or an Organization to be able to grant Donations, since the PartyId attribute migrates[3] from Party to Donation, but I have assigned a role name[4] (i.e., DonorId) to such attribute in order to make it more meaningful in the context of the Donation entity.

Person and Organization

In the same way, PartyId migrates to Person as PersonId, and to Organization as OrganizationId, and these two entities, in turn, can hold their specific attributes.

Voter and NonVoter

I consider that, in your business context, a given Person can be a Voter and a NonVoter over time, depending on the particular District in which said Person is located, thus I have temporarily depicted this circumstances in a nonexclusive supertype cluster, but I consider that we need to clarify some points in this respect.

Aspects that require clarification

The goal of this data modeling excersise is to capture the things of interest with respect to your business context requirements, and one can serve that purpose by determining the entities involved along with their respective attributes (including, with especial attention, the ones that uniquely identify each entity ocurrence) and the way in which such entities are interconected. In this manner, the semantic value that said aspects hold for you, and your data users, is paramount. The better these factors are comprehended, the better they can be depicted in a data model.

Having said that, I have created three more models (also preliminary) in order to have a vehicule to exemplify some ideas about your scenario. They are contained in this PDF, which you can download from Dropbox.

Then, here are the concrete aspects that need clarification:

  1. How do you plan to inscribe people in your system? (a) Do you have access to the electoral register (or electoral roll)? (b) Will you enter this data once you have door knocked their house?
  2. Which attributes do you want to store about a given DoorKnock?
  3. It seems reasonable to state that a Person may receive the status of Voter or NonVoter in the context of —as mentioned above— (a) the District in which such Person resides and (b) a determined Election instance. What do you think about it?
    • Maybe, a person must live within a certain period in a given district in order to be able to vote in a determined election.
  4. What are the attributes of interest in relation to a Person?
  5. Which are the particular attributes that you want to retain regarding a Voter?
  6. What are the specific attributes that you wish to store about a NonVoter?
    • Perhaps, they are personal attributes that belong in the Person entity.
  7. It also appears fair to say that a Donation can be granted only in the context of an Election occurrence, is this correct?
    • This possibility is expressed in all the three models.
  8. With respect to Donations granted by Organizations, (a) are organizations that reside outside your election district allowed to make donations, or (b) is this only permitted for organizations located in your specific district?
    • If this is allowed only to local Organizations, it would be necessary to move the District PK from Person to Party. I am thinking that, actually, that could be the right place for this attribute, but let me know your ideas about it. I have illustrated this option in Model C.
  9. How much Donations are allowed by each Party? (a) One, (b) a fixed number, or (c) an undefined quantity?
    • Option (c) is also currently presented in the three models.
  10. Are there restrictions related to (a) a MinimumAmount and (b) a MaximumAmount per individual Donation?
  11. Do you have restrictions associated to (a) a MinimumAmount and (b) a MaximumAmount per the total set of Donations granted by a certain Party?
  12. How do you identify an Election instance?
    • I think that a given election can be uniquely identified by means of the date in which it is celebrated (as depicted in Model A) and, perhaps, by a combination of the date and the district where it is held (as depicted in Model B).
  13. What attribute (or group or attributes) uniquely identify a District occurrence?
    • As you know, I have used DistrictNumber for this purpose in all the models, but it may well be a different attribute.

Response to your comments

I have never heard of type-subtype relationships in a database. This is very interesting.

Yes —as far as I know— Type-Subtype relationships were introduced, as such, in the relational paradigm back in 1979[5] and, unfortunately, they are not widely used in common implementations. But now that we know them, we can take advantage of this interesting and powerful structure in your scenario.

I am not all that practiced with databases so not sure how to begin implementing the columns needed. All of my INSERTS are straightforward single table inserts and it looks like this approach will require some fancier queries. I'm definitely willing to learn, though.

Do not worry, it is completely understandable that you are familiar with other computer science branches but not that much with relational databases, and I consider that this is a good oportunity for you to know more about this fascinating discipline, we all learn new things every day.

Yes, INSERTing a row (or set of rows) in a database table implies different considerations than, say, creating a record or file in a programming language.

We need first to focus in obtaining a stable logical data model, then we will easily derive a physical model (DDL statements or “CREATE Table code”). Once we have a clear target (all the aforementioned) defined, I will be glad to offer some ideas and guidance in relation to your INSERTs implementation and other pertaining elements.


Notes

1. Party is a term used in legal contexts to refer to a single person or group of persons that compose an individual entity.

2. Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was defined as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST).

3. IDEF1X defines key migration as “The modeling process of placing the primary key of a parent or generic entity (or supertype) in its child or category entity (or subtype) as a foreign key”.

4. The use of role names is recommended since 1970 by Dr. E. F. Codd in his seminal paper entitled “A Relational Model of Data for Large Shared Data Banks”. For its part, IDEF1X —keeping fidelity regards relational practices— also advocates role naming.

5. By Dr. Codd (as one would expect) in his work entitled “Extending the Database Relational Model to Capture More Meaning”.