You could always do something like this:
Create a users
table to store all of your user info:
create table users
(
id int,
user_name varchar(50)
);
insert into users values
(1, 'John'),
(2, 'Jill'),
(3, 'Bob'),
(4, 'Sam');
You state that you might need to apply this type of tracking for multiple applications. So you can create an application
table to store the app name and any details.
create table applications
(
id int,
app_name varchar(50)
);
insert into applications values
(1, 'App 1'),
(2, 'App 2'),
(3, 'App 3');
Next create a categories
table to store the info on each of the categories that you want to track by the users. This table will include a column that will contain the application id. This table will contain a foreign key relationship to the applications
table. This allows you to have some flexibility in having the same category in multiple apps.
create table categories
(
id int,
cat_name varchar(100),
a_id int
);
insert into categories values
(1, 'Cat 1', 1),
(2, 'Cat 2', 1),
(3, 'Cat 3', 2),
(4, 'Cat 4', 3);
Finally, create the users_categories
table which will contain the stats that you want to track, including user
, category
and then the action_date
(the date the event took place).
create table users_categories
(
u_id int,
c_id int,
action_date datetime
);
insert into users_categories values
(1, 1, current_timestamp()),
(2, 3, '2012-01-15 10:15'),
(3, 4, '2012-08-24 12:48'),
(1, 2, '2012-06-08 08:50');
Then to query the data, you would use something similar to this:
select *
from users u
left join users_categories uc
on u.id = uc.u_id
left join categories c
on uc.c_id = c.id
left join applications a
on c.a_id = a.id
And the results would contain (See SQL Fiddle with Demo):
ID | USER_NAME | U_ID | C_ID | ACTION_DATE | CAT_NAME | A_ID | APP_NAME
------------------------------------------------------------------------------------------
1 | John | 1 | 1 | '2012-09-28 06:47:53' | Cat 1 | 1 | App 1
1 | John | 1 | 2 | '2012-06-08 08:50:00' | Cat 2 | 1 | App 1
2 | Jill | 2 | 3 | '2012-01-15 10:15:00' | Cat 3 | 2 | App 2
3 | Bob | 3 | 4 | '2012-08-24 12:48:00' | Cat 4 | 3 | App 3
4 | Sam | (null) | (null) | (null) | (null) | (null) | (null)
A few other suggestions, you could technically remove the a_id
from the categories
table and create a join table between categories and applications, similar to this:
create table categories_applications
(
c_id int,
a_id int
);
insert into categories_applications values
(1, 1),
(2, 3),
(3, 1),
(1, 2);
This would lead to one additional join in your queries, but the results would be the same:
select *
from users u
left join users_categories uc
on u.id = uc.u_id
left join categories c
on uc.c_id = c.id
left join categories_applications ca
on c.id = ca.c_id
left join applications a
on ca.a_id = a.id
For a data warehousing/analytics type application (DW/OLAP), I would go with PostgreSQL. It has set operators, windowing functions (also known as analytic functions) and common table expressions. You will most likely have to implement some or all of these in your own code with MySQL with the attendant possibility of bugs. MySQL is more suited to read-heavy OLTP type applications, whereas PostgreSQL is better (IMHO) for OLAP work.
With respect to "multi-tenancy" - why not use the same database for all and simply identify different customers by a customer_id - or (depending on the number of customers) have separate tables for each customer and non-customer specific data in other tables? How many customers do you have and how much data do you expect to be storing/analysing in the next 10 years? Modern RDBMSs can store large amounts of data and retrieve quickly (with suitable disk configuration), but I would favour PostgreSQL for large databases.
Sharding might be interesting if you only occasionally query across customers and most of your analysis is done per customer - you could set up a table (or even schema) per customer (on different disks). Again, knowledge of data volumes would help here.
Furthermore, with respect to multi-tenancy, see my question to jynus about querying across servers - or even different schemas within the same server.
I take jynus' point about "holy wars" - this post is just my own opinion about your particular question - were I to be asked about a different scenario, I might well recommend MySQL.
[EDIT]
What, exactly, is "my dashboard performance"?
Best Answer
Actually no, I wouldn't do this and keep it in the same table. The flag should be enough to distinguish between these two. The only difference in your queries is a
active = true
. Personally, I wouldn't consider this a bad design choice (altough your idea is not wrong, I would not use it in this case with - I'm guessing - a small set of data). Since only on set of data belongs to a single user,JOIN
s would make not that much sense.Again, I wouldn't do that personally(!). I don't see a benefit of seperating user centric data from the user. If you fear losing the data on deletion, introduce a
deleted
flag. This way, you will never lose data and your application just respects thedeleted
flag in your queries (this would also apply if you split the data accross tables).Storing credit cards is always very sensitive, according to PCI (as far as i remember) you are not allowed to store the CVV code (hope someone can correct me on that). But since this seems to be a fictional application, that should not be a problem.