Database Design – Deciding When Items Are Related vs Attributes

database-designerd

I've been trying to figure this out for a while. Let's say you have a person, for example. An attribute of a person is his or her social security number, right? But a person also has a social security number. So, in an ER diagram, you could draw a square box for person and a square box for ssid, and you could connect them by a diamond, has. Alternatively, you could draw a circle, ssid, and you could connect it to a square person box.

This isn't just true of ssid, though. It's possible to draw either-or with concrete things, like a car, a pet, or a phone number, or with conceptual things, like a phone number, a friendship, or a mood. So, where do I draw the lines? Is the object of design just to get everything down on paper and looking nice, or should I be using a guide?

Best Answer

There are two ways to look at this, the informal way and the formal, theoretical way.

The Informal Way:

If something is of so much interest to your system that you want to record attributes about it, then it probably belongs in its own table.

If the only thing you care about Social Security Number is what digits it contains (i.e. what the number is) then this number is best modelled as an attribute of something else (e.g. PERSON).

If, on the other hand, your system cares about other facts that pertain to SSID, like maybe when it was issued, which office issued it and whatever else you might know and care about an SSID, then you probably want to split everything you know about the SSID into its own table and then relate that table to PERSON.

The Formal Way:

If the rules of normalization demand that you split your multiple facts (attributes) about an SSID into a separate relation (table) - of which the social security number is the primary key, then SSID is a box and not a circle.

Specifically, if you have attributes which depend solely on the SSID then you would want to remove these from a PERSON table because they would have only a transitive dependency on the ID of PERSON and therefore 3NF demands that these attributes be removed to a separate table.