I have several revenue based datasets with below structure:
Subject 2012 2013 2014 2015
online marketing 54 50 60 80
website ads 900 850 320 250
mobile ads 60 80 120 130
video ads 30 40 50 80
there are over 10 tables (example: website_1 / website_2 / website_3). I was planning to change the dataset to below structure so I can join tables and do comparison of data based on year.
Subject revenue year
online marketing 54 2012
online marketing 50 2013
online marketing 60 2014
online marketing 80 2015
website ads 900 2012
....
mobile ads 60 2012
....
video ads
...
I dont think the above structure is efficient.
what would be the best way to store such data in a relational database?
Best Answer
Why do you believe that the second design is not efficient? It would be the design that I would recommend to you. It allows the data set to be scaled to add more subjects and more years without needing to adjust the schema.
To extend the normalization, place the subjects within their own table and reference that from your main table. This will reduce duplicate names in your main table and also reduce the space required to store your data.
Depending on whether you can or not, you could also reduce your table count down to 3 by adding a column to your main tables called
website
, then create a new table to hold the website list and reference this again from your main table.You would then end up with a structure similar to this:
This creates a table for the
website
, a table forsubjects
and one for therevenue
. Therevenue
table contains references to the former tables and also creates aunique constraint
across the combination of thewebsite
,subject
andyear
to prevent duplicate entries.Naturally, normalization makes query definitions larger (as you now need to join to other tables) but this is trivial really and the ability to scale out to multiple websites, subjects and years easily is in my opinion worth the extra effort of writing a longer query.
After time, should the data set grow to a point whereby querying it becomes very slow, you can either archive off data that you don't need anymore or warehouse it in a denormalized fashion.