Store a revenue based dataset in a relational database

normalizationrelational-theory

I have several revenue based datasets with below structure:

Subject                 2012   2013    2014    2015
online marketing         54      50     60      80
website ads             900     850    320     250
mobile ads               60      80    120     130       
video ads                30      40     50      80

there are over 10 tables (example: website_1 / website_2 / website_3). I was planning to change the dataset to below structure so I can join tables and do comparison of data based on year.

Subject                 revenue   year 
online marketing        54        2012 
online marketing        50        2013
online marketing        60        2014
online marketing        80        2015 
website ads             900       2012
....             
mobile ads              60        2012 
....             
video ads 
...      

I dont think the above structure is efficient.
what would be the best way to store such data in a relational database?

Best Answer

Why do you believe that the second design is not efficient? It would be the design that I would recommend to you. It allows the data set to be scaled to add more subjects and more years without needing to adjust the schema.

To extend the normalization, place the subjects within their own table and reference that from your main table. This will reduce duplicate names in your main table and also reduce the space required to store your data.

Depending on whether you can or not, you could also reduce your table count down to 3 by adding a column to your main tables called website, then create a new table to hold the website list and reference this again from your main table.

You would then end up with a structure similar to this:

create table "websites"
(
  id int not null primary key,
  name varcher(64) not null unique
);
create table "subjects"
(
  id int not null primary key,
  name varcher(100) not null unique
);
create table "revenue"
(
  id int not null primary key,
  website int not null,
  subject int not null,
  year smallint not null,
  value int not null,
  constraint uk_revenue unique (website, subject, year),
  constraint fk_revenue_website foreign key (website) references website (id),
  constraint fk_revenue_subject foreign key (subject) references subject (id)
);

This creates a table for the website, a table for subjects and one for the revenue. The revenue table contains references to the former tables and also creates a unique constraint across the combination of the website, subject and year to prevent duplicate entries.

Naturally, normalization makes query definitions larger (as you now need to join to other tables) but this is trivial really and the ability to scale out to multiple websites, subjects and years easily is in my opinion worth the extra effort of writing a longer query.

After time, should the data set grow to a point whereby querying it becomes very slow, you can either archive off data that you don't need anymore or warehouse it in a denormalized fashion.