Database Design – Splitting Information About People, Accounts, and Transactions

database-design

I am having trouble deciding which design would be considered a better practice.

Say, for example, I have two tables:

TABLE Person
pid int primary key
name varchar
age int
bank_account varchar
TABLE Transactions
tid int primary key
date datetime
amount float
from int references person
to int references person

This is just an example and not using any particular syntax.

Now, my Person table will have multiple entries that have all the same information aside from the bank_account, since a person can have multiple different accounts they make transactions from or receive to. Would it be better to do it like this, or would it be better to create a separate account table that holds all the different account numbers with a reference to the person that owns it, and then changing the references in the Transactions table to reference the new account table?

I know that duplicating information in person is in accordance with 1NF, but it doesn't seem like good design to me. Any insight would be appreciated.

Best Answer

It seems a cliché, but it all depends on the specific informational requirements of relevance, because a database has to be designed with the exact business needs in mind.

As for the circumstances at issue:

  1. In case that in the applicable business environment each Transaction is performed from Person to Person and the concept of Account does not exist, then leaving the tables as you decribe in the no-particular-sintax example seems to be mirroring the characteristics of the business context accurately.

  2. In case that, on the contrary, in said business scenario every Transaction is carried out from Account to Account, then, yes, creating an Account(s) table which would be referred to from two FOREIGN KEY constraints in the Transaction(s) table would be the best practice, because by doing so you would be reflecting the characteristics of the business domain with precision. The Transaction(s) rows would then be associated with the respective Person(s) rows via the corresponding Account(s) counterparts.

You state in the question itself that

[...] a person can have multiple different accounts they make transactions from or receive to.

...therefore, proceeding as mentioned in case no. 2 would be more convenient in your very specific situation.

On the other hand, when designing a relational database, I highly advise that you start (i) analyzing the situation from a conceptual perspective —identifying entity types, properties, associations, drawing diagrams, etc.—, then (ii) continue establishing the logical aspects —laying out the corresponding tables, types, columns, constraints to ensure integrity, domains if possible, testing with the normal forms, etc.—, and then (iii) putting up the physical settings of convenience —optimizing database response by configuring indexes, partitions, hardware upgrades, network configurations, etc.—. In this manner you would be approaching each scenario with a clear separation of concerns.

Regarding first normal form (1NF) —which is brought up in the question—, it basically has to do with the definition of atomic domains (i.e., each column must contain exactly one value of the applicable domain/type per row), which, as well is entirely dependent on the informational requirements of significance. Further normalization (2NF and above), in turn, implies getting rid of dangerous redundancies (special types of undesirable data duplication) and is executed in terms of functional dependencies (interrelationships between sets of attributes/columns).

By the way, you may find of interest this Q & A, which involves a similar scenario and contains several notions about database design that may be applied from a general point of view