Store a revenue based dataset in a relational database

normalizationrelational-theory

I have several revenue based datasets with below structure:

Subject                 2012   2013    2014    2015
online marketing         54      50     60      80
website ads             900     850    320     250
mobile ads               60      80    120     130       
video ads                30      40     50      80

there are over 10 tables (example: website_1 / website_2 / website_3). I was planning to change the dataset to below structure so I can join tables and do comparison of data based on year.

Subject                 revenue   year 
online marketing        54        2012 
online marketing        50        2013
online marketing        60        2014
online marketing        80        2015 
website ads             900       2012
....             
mobile ads              60        2012 
....             
video ads 
...

I dont think the above structure is efficient.
what would be the best way to store such data in a relational database?

Best Answer

Why do you believe that the second design is not efficient? It would be the design that I would recommend to you. It allows the data set to be scaled to add more subjects and more years without needing to adjust the schema.

To extend the normalization, place the subjects within their own table and reference that from your main table. This will reduce duplicate names in your main table and also reduce the space required to store your data.

Depending on whether you can or not, you could also reduce your table count down to 3 by adding a column to your main tables called website, then create a new table to hold the website list and reference this again from your main table.

You would then end up with a structure similar to this:

create table "websites"
(
  id int not null primary key,
  name varcher(64) not null unique
);
create table "subjects"
(
  id int not null primary key,
  name varcher(100) not null unique
);
create table "revenue"
(
  id int not null primary key,
  website int not null,
  subject int not null,
  year smallint not null,
  value int not null,
  constraint uk_revenue unique (website, subject, year),
  constraint fk_revenue_website foreign key (website) references website (id),
  constraint fk_revenue_subject foreign key (subject) references subject (id)
);

This creates a table for the website, a table for subjects and one for the revenue. The revenue table contains references to the former tables and also creates a unique constraint across the combination of the website, subject and year to prevent duplicate entries.

Naturally, normalization makes query definitions larger (as you now need to join to other tables) but this is trivial really and the ability to scale out to multiple websites, subjects and years easily is in my opinion worth the extra effort of writing a longer query.

After time, should the data set grow to a point whereby querying it becomes very slow, you can either archive off data that you don't need anymore or warehouse it in a denormalized fashion.

Related Solutions

Relational algebra question

DISCLAIMER : Never Learned Relational Algebra but it looks interesting

From the schema given and your question, this is what the SQL should be:

SELECT
    emp_mgr.person_name
FROM
    manages emp_mgr
    INNER JOIN employee emp ON emp_mgr.person_name  = emp.person_name
    INNER JOIN employee mgr ON emp_mgr.manager_name = mgr.person_name
WHERE
    emp.street = mgr.street AND
    emp.city = mgr.city
;

Here is another query that only uses JOINs, no WHERE clause:

SELECT
    emp.person_name
FROM
    (SELECT A.person_name,B.street,B.city FROM manages A
    INNER JOIN employee B ON A.person_name = B.person_name) emp
    NATURAL JOIN
    (SELECT A.manager_name,B.street,B.city FROM manages A
    INNER JOIN employee B ON A.manager_name = B.person_name) mgr
;

The first query gets all employees who are managed and their managers in the form of a Cartesian Product. Then, it looks for a common street and city.

The second query collects personnel records (name,street,city) of employees and their managers and performs a NATURAL JOIN between the employess and their managers using (street,city).

If you can transalate both queries back to Relational Algebra, I think you will have what you are looking for. I believe the second may be of better help.

Best relational database structure for this data

The way you've modelled it is fine. Your model ensures that the business rules are enforced by the database.

There are a couple of things you could do as an alternative. One would be to eliminate the surrogate key on Roles_Applications. As an intersection table, you could use the two foreign keys together as a composite primary key. If you did this, that would propagate Role and Application down to your Applications_Permissions_Roles table. This would have the advantage of giving you more of a "one stop shop" for your application permission data (i.e. fewer joins) without compromising your normalization in any way.

Another way you could go would be to simplify slightly and define a default permission for each application. You could call it whatever you like, such as "default access" or "basic access" or "user" or whatever makes sense to you. This would allow you to flatten your model and essentially drop the Roles_Applications table and join Applications_Permissions_Roles straight to Roles. This would change the nature of the query that you would use to ask "which roles can access which applications?" but your business rules would still be enforced by the schema.

Best Answer

Related Solutions

Relational algebra question

Best relational database structure for this data

Related Question