SQL Server – Table Structure for Validating Data Using Category and Subcategory Fields

check-constraintsdata validationforeign keysql server

I am creating a simple budget application, and the primary table will be a transaction table that contains things like the amount of the transaction, date, account, the category and subcategory (the type of transaction it was).

A spending category might be "Entertainment" or "Food" or "Household" and each would be further subdivided into subcategories. The "Food" category might have "Work Lunch", "Grocery Store", "Fast Food", etc. subcategories, for example.

One way to do this is to have a table Categories:

    id    category     subcategory
----------------------------------
     1        Food      Work Lunch
     2        Food       Fast Food
     3        Food   Grocery Store
     4   Household        Mortgage
     5   Household         Repairs

and so on. I then reference this id field in my table Transaction, using type as a foreign key:

id    txdate  amount   type   account   ...
-------------------------------------------
 1   6/25/15  15.25       2      cash   ... (a fast food expense)

The problem with this is that I'd like to be able to allow for null subcategories, and that doesn't seem to fit that well with this model.

Another way would be to just put the category and subcategory into the main Transaction table:

id    txdate  amount   category   subcategory   account   ...
--------------------------------------------------------------
 1   6/25/15   15.25       Food     Fast Food      cash                     ...

I have considered using CHECK CONSTRAINTs but this gets too bulky and messy too quickly, and does not allow me an easy way to access the allowable fields from my front-end code. I will be validating data in the front end but I would like a robust way to ensure data validation on the actual database update as well, so I thought it would be convenient to store the values in a table (or more than one table), and then retrieve these allowable values in the code.

Any solution will work, but this is such a basic thing that I'm sure some best practices exist for this, and I'd like to query you all to find out what those would be.

Thank you for your time and expertise!

Best Answer

There are a few different strategies you could take, where on one end you pursue aggressive normalization to, on the other end, full denormalization. The full denormalization would be equivalent to your second example where all relevant info simply ends up in the transaction table without references to other tables.

Full Normalization

So, to completely normalize, you would still want a Categories table, but you want to even eliminate the storage of redundant information in this table, so you would need a CategoryList table and a SubCategoryList table as

CategoryList

id    category     
----------------
 1     Food      
 2     Household
 etc...

and

SubCategoryList

id    subcategory     
----------------
 1     Work Lunch    
 2     Fast Food
 3     Grocery Store
 4     Mortgage
 5     Repairs
 etc...

You could then construct your category table from these two tables as

Categories

id    category_id     subcategory_id
----------------------------------
 1        1               1
 2        1               2
 3        1               3
 4        2               4
 5        2               5

Treatment of NULL subcategories can easily be handled by either 1) simply placing a NULL entry for the subcategory_id column in the appropriate row of the Categories table, or 2) adding a subcategory entry id, subcategory where the subcategory field is NULL.

Last but not least you would add a foreigh key reference from your Transactions table to the appropriate id in the Categories table.

Does it really need to be so normalized?

Well, in my opinion, no it doesn't. I've heard a quote, though I can't remember who spoke it, but it basically goes "Normalize until it hurts, denormailize until it works." Especially in the case where you don't have a lot of categories, the fully normalized design may be a little bit of overkill.

What might simply make more sense would be to keep the above mentioned CategoryList and SubCategoryList tables to enumerate your types, but skip making the separate Categories table, and then simply have your Transactions table referencing the CategoryList and SubCategoryList tables as

id    txdate  amount   category   subcategory   account   ...
--------------------------------------------------------------
 1   6/25/15   15.25      1             2        cash                     ...

This way, you save on storage, and you can easily update/modify any category or subcategory entry in the list without needing to modify your entire Transactions table. Further, you can simply permit the subcategory column of the Transactions table to permit NULL entries, if need be.

Hope this helps!