I'm working on a "Bus stations" database. I want to design the next relationship:
"A bus line has many stops (one stop belongs to one line) and one stop has many arrival times (one arrival time belongs to one stop)".
So my design is like this:
Lines (1,1)<->(1,N) Stops (1,1)<->(1,N) Times
Lines(id, name, ...)
Stops(id, line_id, name, ...)
Times(id, stop_id, ...)
But when I was testing it, I realized that I needed a FOREIGN KEY line_id
in Times
table (because I could not distinguish between times of different bus lines). This transforms the diagram:
Lines (1,1)<->(1,N) Stops (1,1)<->(1,N) Times
| |
| |
---------------------------------------
(1,1) (1,N)
I'm a little bit confused because I think this design has redundancy: the relationship between Lines
and Times
could not be reached by the intermediate relationships?
Best Answer
You should not need the extra FK in
Times
as that information can be derived from the existing relationships, but you will need toJOIN
in the other tables to get at the extra property. Adding the extra key like that is sometimes a necessary optimisation but it does break "normal form" as you are duplicating data (meaning that either your business layer becomes responsible for maintaining the referential integrity of that duplicate, or you need to do the same in the database using triggers and other "powerful but be very careful with them" features of your chose database).Your
select * from times where stop_id = 1 and line_id = 1
should be something like:To simplify queries you can create views that abstract out the underlying structure a little, meaning you can keep the data in best form while dealing with it as it it did have the extra columns with duplicated information.
I'm not sure I'd model a timetable store that way though. I would assume that stops are separate entities with a many-to-many link between lines (unless you are counting a stop at which the no. 4, 5, and 45x lines stop at as three separate stops even though they are physically the same location). Also I'd probably want to group the lines together so that all the times for the "no 4" service are identifiable as a related collection but you can distinguish between the arrival times of each (so you can ask "what is the arrival time at X for the service that leaves Y at HH:MM" and so forth. Of course I could just be misreading your intended model! I'm thinking something like:
Here TimedStop becomes a many-to-many relationship store for timed routes and stops. Of course there are probably several perfectly valid ways to model such a system depending on your other constraints, and the above may be rendered an incorrect model when you consider the fuller design spec (without more detail of what you are trying to model and what outputs are desired this is not possible to completely pin down, I made a fair few assumptions, which may or may not be correct, when throwing the above diagram together).
As another aside on a point of style:
select *
is generally best avoided in permanent code where possible. If you are instead specific about what you want out this because part of your API and other code using this output has more guarantees of what columns will be returned, even if the underlying structures are updated. It can also allow the query planner/runner to apply extra optimisations (potentially avoiding extra heap lookups or being able to replace a table scan with an index scan, and so forth).