Nested one to many relationships

database-design

I'm working on a "Bus stations" database. I want to design the next relationship:
"A bus line has many stops (one stop belongs to one line) and one stop has many arrival times (one arrival time belongs to one stop)".

So my design is like this:

Lines (1,1)<->(1,N) Stops (1,1)<->(1,N) Times

Lines(id, name, ...)
Stops(id, line_id, name, ...)
Times(id, stop_id, ...)

But when I was testing it, I realized that I needed a FOREIGN KEY line_id in Times table (because I could not distinguish between times of different bus lines). This transforms the diagram:

Lines (1,1)<->(1,N) Stops (1,1)<->(1,N) Times
  |                                       |
  |                                       | 
   ---------------------------------------
  (1,1)                                  (1,N)

I'm a little bit confused because I think this design has redundancy: the relationship between Lines and Times could not be reached by the intermediate relationships?

Best Answer

You should not need the extra FK in Times as that information can be derived from the existing relationships, but you will need to JOIN in the other tables to get at the extra property. Adding the extra key like that is sometimes a necessary optimisation but it does break "normal form" as you are duplicating data (meaning that either your business layer becomes responsible for maintaining the referential integrity of that duplicate, or you need to do the same in the database using triggers and other "powerful but be very careful with them" features of your chose database).

Your select * from times where stop_id = 1 and line_id = 1 should be something like:

SELECT times.id, times.stopid, ...
FROM   times
JOIN   stops ON times.stop_id=stops.id
WHERE  times.stop_id=1 AND stops.line_id=1

To simplify queries you can create views that abstract out the underlying structure a little, meaning you can keep the data in best form while dealing with it as it it did have the extra columns with duplicated information.

I'm not sure I'd model a timetable store that way though. I would assume that stops are separate entities with a many-to-many link between lines (unless you are counting a stop at which the no. 4, 5, and 45x lines stop at as three separate stops even though they are physically the same location). Also I'd probably want to group the lines together so that all the times for the "no 4" service are identifiable as a related collection but you can distinguish between the arrival times of each (so you can ask "what is the arrival time at X for the service that leaves Y at HH:MM" and so forth. Of course I could just be misreading your intended model! I'm thinking something like:

                                       TimedStop
                   TimedRoute          =============
Line               =============       ts_id (PK)         Stop
============       route_id (PK) ===>  route_id (FK)      =============
line_id (PK) ===>  line_id (FK)        stop_id (FK)  <=== stop_id (PK)
line_name                              arrival_time       stop_name
                                       depart_time        stop_location

Here TimedStop becomes a many-to-many relationship store for timed routes and stops. Of course there are probably several perfectly valid ways to model such a system depending on your other constraints, and the above may be rendered an incorrect model when you consider the fuller design spec (without more detail of what you are trying to model and what outputs are desired this is not possible to completely pin down, I made a fair few assumptions, which may or may not be correct, when throwing the above diagram together).

As another aside on a point of style: select * is generally best avoided in permanent code where possible. If you are instead specific about what you want out this because part of your API and other code using this output has more guarantees of what columns will be returned, even if the underlying structures are updated. It can also allow the query planner/runner to apply extra optimisations (potentially avoiding extra heap lookups or being able to replace a table scan with an index scan, and so forth).

Related Solutions

MySQL Database Schema for Train Timetables

I would strongly consider explicitly storing all the days a particular schedule actually runs on. This would give a structure looking something like this:

enter image description here

By doing so, you'll make it much easier to answer questions such as "What are all the trains going to station X on 1 Oct?". It'll also makes "temporary gaps" when trains aren't running (e.g. Christmas day) possible to identify. A one-off train is now simply one with only one entry in SCHEDULE_DAYS.

As the schedule can be different on weekends to weekdays, I think it's better to have separate rows for each day. This allows linking different schedules for every day of the week, should you ever need to do this.

Constraint many to many table between two child tables

Why must the employees working on a project strictly belong to the same division that owns the project? That sounds more like a business rule that you can leave up to your application to enforce rather than something that requires DRI.

But let's say you need to enforce this using DRI. You said you have a many-to-many table associating employees with projects. Assuming you are using SQL Server, you could expand this table as follows:

CREATE TABLE dbo.Employee_Project (
     ProjectID            INT
   , ProjectDivisionID    INT

   , EmployeeID           INT
   , EmployeeDivisionID   INT

   , Hours                INT

   , CONSTRAINT FK_Project FOREIGN KEY (ProjectID, ProjectDivisionID) 
        REFRENCES dbo.Project(ProjectID, DivisionID)

   , CONSTRAINT FK_Employee FOREIGN KEY (EmployeeID, EmployeeDivisionID) 
        REFERENCES dbo.Employee(EmployeeID, DivisionID)

   , CONSTRAINT CK_EmployeeProjectSameDivision CHECK (ProjectDivisionID = EmployeeDivisionID)
);

This is what's going on here:

We have two composite foreign keys here, one for Employee and one for Project, so we can have the DivisionID for both entities in the same table and guaranteed by DRI to be correct.
These composite foreign keys require a UNIQUE index on the referenced tables. So in addition to your primary keys on EmployeeID and ProjectID in those tables, you'll need unique keys on (EmployeeID, DivisionID) and (ProjectID, DivisionID).
The CHECK constraint guarantees that an employee can only be assigned to a project owned by the same division.

If you want to see another example of this design pattern, here's the answer I gave to a design problem that had similar requirements.

Best Answer

Related Solutions

MySQL Database Schema for Train Timetables

Constraint many to many table between two child tables

Related Question