Help with database design

database-designperformance

I'm quite the beginner when it comes to databases, so please bear that in mind.

I need to store user statistics in several categories for an application.

In the near future I will most likely need to add another application, with different categories, but for the same user.

Both applications will likely change which data is saved over time. perhaps via adding or removing individual fields or even entire categories.

Taking into account performance, maintenance and simplicity of future queries, what's the best approach to database design in this case?

Should I have one table for each application?

Or should I divide everything into smaller tables, maybe even stored in a different database?

Best Answer

You could always do something like this:

Create a users table to store all of your user info:

create table users
(
  id int, 
  user_name varchar(50)
);

insert into users values
(1, 'John'),
(2, 'Jill'),
(3, 'Bob'),
(4, 'Sam');

You state that you might need to apply this type of tracking for multiple applications. So you can create an application table to store the app name and any details.

create table applications 
(
  id int, 
  app_name varchar(50)
);

insert into applications values
(1, 'App 1'),
(2, 'App 2'),
(3, 'App 3');

Next create a categories table to store the info on each of the categories that you want to track by the users. This table will include a column that will contain the application id. This table will contain a foreign key relationship to the applications table. This allows you to have some flexibility in having the same category in multiple apps.

create table categories 
(
  id int,  
  cat_name varchar(100),
  a_id int 
);

insert into categories values
(1, 'Cat 1', 1),
(2, 'Cat 2', 1),
(3, 'Cat 3', 2),
(4, 'Cat 4', 3);

Finally, create the users_categories table which will contain the stats that you want to track, including user, category and then the action_date (the date the event took place).

create table users_categories
(
  u_id int,
  c_id int,
  action_date datetime
);
insert into users_categories values
(1, 1, current_timestamp()),
(2, 3, '2012-01-15 10:15'),
(3, 4, '2012-08-24 12:48'),
(1, 2, '2012-06-08 08:50');

Then to query the data, you would use something similar to this:

select *
from users u
left join users_categories uc
  on u.id = uc.u_id
left join categories c
  on uc.c_id = c.id
left join applications a
  on c.a_id = a.id

And the results would contain (See SQL Fiddle with Demo):

ID | USER_NAME | U_ID   | C_ID   | ACTION_DATE           | CAT_NAME | A_ID    | APP_NAME
------------------------------------------------------------------------------------------
1  | John      | 1      | 1      | '2012-09-28 06:47:53' | Cat 1    | 1       | App 1
1  | John      | 1      | 2      | '2012-06-08 08:50:00' | Cat 2    | 1       | App 1
2  | Jill      | 2      | 3      | '2012-01-15 10:15:00' | Cat 3    | 2       | App 2
3  | Bob       | 3      | 4      | '2012-08-24 12:48:00' | Cat 4    | 3       | App 3
4  | Sam       | (null) | (null) | (null)                | (null)   | (null)  | (null)

A few other suggestions, you could technically remove the a_id from the categories table and create a join table between categories and applications, similar to this:

create table categories_applications
(
  c_id int,
  a_id int
);
insert into categories_applications values
(1, 1),
(2, 3),
(3, 1),
(1, 2);

This would lead to one additional join in your queries, but the results would be the same:

select *
from users u
left join users_categories uc
  on u.id = uc.u_id
left join categories c
  on uc.c_id = c.id
left join categories_applications ca
  on c.id = ca.c_id
left join applications a
  on ca.a_id = a.id
Related Question