I am creating a simple budget application, and the primary table will be a transaction table that contains things like the amount of the transaction, date, account, the category and subcategory (the type of transaction it was).
A spending category might be "Entertainment" or "Food" or "Household" and each would be further subdivided into subcategories. The "Food" category might have "Work Lunch", "Grocery Store", "Fast Food", etc. subcategories, for example.
One way to do this is to have a table Categories
:
id category subcategory ---------------------------------- 1 Food Work Lunch 2 Food Fast Food 3 Food Grocery Store 4 Household Mortgage 5 Household Repairs
and so on. I then reference this id
field in my table Transaction
, using type
as a foreign key:
id txdate amount type account ... ------------------------------------------- 1 6/25/15 15.25 2 cash ... (a fast food expense)
The problem with this is that I'd like to be able to allow for null
subcategories, and that doesn't seem to fit that well with this model.
Another way would be to just put the category
and subcategory
into the main Transaction
table:
id txdate amount category subcategory account ... -------------------------------------------------------------- 1 6/25/15 15.25 Food Fast Food cash ...
I have considered using CHECK
CONSTRAINT
s but this gets too bulky and messy too quickly, and does not allow me an easy way to access the allowable fields from my front-end code. I will be validating data in the front end but I would like a robust way to ensure data validation on the actual database update as well, so I thought it would be convenient to store the values in a table (or more than one table), and then retrieve these allowable values in the code.
Any solution will work, but this is such a basic thing that I'm sure some best practices exist for this, and I'd like to query you all to find out what those would be.
Thank you for your time and expertise!
Best Answer
There are a few different strategies you could take, where on one end you pursue aggressive normalization to, on the other end, full denormalization. The full denormalization would be equivalent to your second example where all relevant info simply ends up in the transaction table without references to other tables.
Full Normalization
So, to completely normalize, you would still want a
Categories
table, but you want to even eliminate the storage of redundant information in this table, so you would need aCategoryList
table and aSubCategoryList
table asCategoryList
and
SubCategoryList
You could then construct your category table from these two tables as
Categories
Treatment of
NULL
subcategories can easily be handled by either 1) simply placing a NULL entry for thesubcategory_id
column in the appropriate row of theCategories
table, or 2) adding a subcategory entryid, subcategory
where the subcategory field isNULL
.Last but not least you would add a foreigh key reference from your
Transactions
table to the appropriateid
in theCategories
table.Does it really need to be so normalized?
Well, in my opinion, no it doesn't. I've heard a quote, though I can't remember who spoke it, but it basically goes "Normalize until it hurts, denormailize until it works." Especially in the case where you don't have a lot of categories, the fully normalized design may be a little bit of overkill.
What might simply make more sense would be to keep the above mentioned
CategoryList
andSubCategoryList
tables to enumerate your types, but skip making the separateCategories
table, and then simply have yourTransactions
table referencing theCategoryList
andSubCategoryList
tables asThis way, you save on storage, and you can easily update/modify any category or subcategory entry in the list without needing to modify your entire
Transactions
table. Further, you can simply permit thesubcategory
column of theTransactions
table to permitNULL
entries, if need be.Hope this helps!