I'm having a question rounding my mind. It's better to create one table and store a lot of rows there or create various tables and have a few rows on them? What's the performance and the I/O saved at physical level? How many space will use many tables instead of one table? I'm thinking in how WordPress handling the multiblog feature because I'm starting a new project and I'm looking for the best approach.
Mysql – One table with many rows or many tables with a few rows
database-designdatabase-recommendationMySQLperformance
Related Solutions
Why must the employees working on a project strictly belong to the same division that owns the project? That sounds more like a business rule that you can leave up to your application to enforce rather than something that requires DRI.
But let's say you need to enforce this using DRI. You said you have a many-to-many table associating employees with projects. Assuming you are using SQL Server, you could expand this table as follows:
CREATE TABLE dbo.Employee_Project (
ProjectID INT
, ProjectDivisionID INT
, EmployeeID INT
, EmployeeDivisionID INT
, Hours INT
, CONSTRAINT FK_Project FOREIGN KEY (ProjectID, ProjectDivisionID)
REFRENCES dbo.Project(ProjectID, DivisionID)
, CONSTRAINT FK_Employee FOREIGN KEY (EmployeeID, EmployeeDivisionID)
REFERENCES dbo.Employee(EmployeeID, DivisionID)
, CONSTRAINT CK_EmployeeProjectSameDivision CHECK (ProjectDivisionID = EmployeeDivisionID)
);
This is what's going on here:
- We have two composite foreign keys here, one for Employee and one for Project, so we can have the DivisionID for both entities in the same table and guaranteed by DRI to be correct.
- These composite foreign keys require a
UNIQUE
index on the referenced tables. So in addition to your primary keys on EmployeeID and ProjectID in those tables, you'll need unique keys on (EmployeeID, DivisionID) and (ProjectID, DivisionID). - The CHECK constraint guarantees that an employee can only be assigned to a project owned by the same division.
If you want to see another example of this design pattern, here's the answer I gave to a design problem that had similar requirements.
You could always do something like this:
Create a users
table to store all of your user info:
create table users
(
id int,
user_name varchar(50)
);
insert into users values
(1, 'John'),
(2, 'Jill'),
(3, 'Bob'),
(4, 'Sam');
You state that you might need to apply this type of tracking for multiple applications. So you can create an application
table to store the app name and any details.
create table applications
(
id int,
app_name varchar(50)
);
insert into applications values
(1, 'App 1'),
(2, 'App 2'),
(3, 'App 3');
Next create a categories
table to store the info on each of the categories that you want to track by the users. This table will include a column that will contain the application id. This table will contain a foreign key relationship to the applications
table. This allows you to have some flexibility in having the same category in multiple apps.
create table categories
(
id int,
cat_name varchar(100),
a_id int
);
insert into categories values
(1, 'Cat 1', 1),
(2, 'Cat 2', 1),
(3, 'Cat 3', 2),
(4, 'Cat 4', 3);
Finally, create the users_categories
table which will contain the stats that you want to track, including user
, category
and then the action_date
(the date the event took place).
create table users_categories
(
u_id int,
c_id int,
action_date datetime
);
insert into users_categories values
(1, 1, current_timestamp()),
(2, 3, '2012-01-15 10:15'),
(3, 4, '2012-08-24 12:48'),
(1, 2, '2012-06-08 08:50');
Then to query the data, you would use something similar to this:
select *
from users u
left join users_categories uc
on u.id = uc.u_id
left join categories c
on uc.c_id = c.id
left join applications a
on c.a_id = a.id
And the results would contain (See SQL Fiddle with Demo):
ID | USER_NAME | U_ID | C_ID | ACTION_DATE | CAT_NAME | A_ID | APP_NAME
------------------------------------------------------------------------------------------
1 | John | 1 | 1 | '2012-09-28 06:47:53' | Cat 1 | 1 | App 1
1 | John | 1 | 2 | '2012-06-08 08:50:00' | Cat 2 | 1 | App 1
2 | Jill | 2 | 3 | '2012-01-15 10:15:00' | Cat 3 | 2 | App 2
3 | Bob | 3 | 4 | '2012-08-24 12:48:00' | Cat 4 | 3 | App 3
4 | Sam | (null) | (null) | (null) | (null) | (null) | (null)
A few other suggestions, you could technically remove the a_id
from the categories
table and create a join table between categories and applications, similar to this:
create table categories_applications
(
c_id int,
a_id int
);
insert into categories_applications values
(1, 1),
(2, 3),
(3, 1),
(1, 2);
This would lead to one additional join in your queries, but the results would be the same:
select *
from users u
left join users_categories uc
on u.id = uc.u_id
left join categories c
on uc.c_id = c.id
left join categories_applications ca
on c.id = ca.c_id
left join applications a
on ca.a_id = a.id
Best Answer
Assuming that you are talking about tables containing the same kinds of entities, you typically want to have one table.
You would not have any performance differences and a whole lot of management differences between the two approaches, with the single table being easier to manage. Typically large tables do not have performance overheads compared to smaller versions of the same when properly indexed, since the index seek time grows slowly compared to the row growth.
In a normalized design, you will have different tables for different entities or relations, but as far as partitioning (either supported with a DBMS feature or manually with separate tables), that usually is necessary when you have certain requirements, like granularity of backup or data loading/unloading on a partition basis.