I have done some research and found that I should store a route as a sequence of stops. Something like:
Start -> Stop A -> Stop B -> Stop C -> End
I have created three tables:
- Routes
- Stops
- RouteStops
…where RouteStops is a junction table.
I have something like:
Routes
+---------+
| routeId |
+---------+
| 1 |
+---------+
| 2 |
+---------+
Stations
+-----------+------+
| stationId | Name |
+-----------+------+
| 1 | A |
+-----------+------+
| 2 | B |
+-----------+------+
| 3 | C |
+-----------+------+
| 4 | D |
+-----------+------+
RouteStations
+-------------+---------------+
| routeId(fk) | stationId(fk) |
+-------------+---------------+
| 1 | A |
+-------------+---------------+
| 1 | C |
+-------------+---------------+
| 1 | D |
+-------------+---------------+
| 2 | A |
+-------------+---------------+
| 2 | D |
+-------------+---------------+
Route 1 goes through
Station A -> Station C -> Station D
Route 2 goes through
Station A -> Station D
Is this a good way to store routes?
According to Wikipedia:
[…] the database system does not guarantee any ordering of the rows unless an
ORDER BY
clause is specified […]
Can I rely on such a database schema or maybe this should be done differently?
This is actually my university project, so I'm just wondering if such schema can be considered as a correct one. For this case, I would probably store only several routes (approx 3-5) and stations (approx 10-15), each route will consist of about 5 stations. I would be also glad to hear how this should look like in case of real and big bus company.
Best Answer
For all business analysis leading to database architecture, I recommend writing rules:
The 1st and 2nd rules as you noticed implies a many to many relationship so you concluded rightfully to create routeStations.
The 3rd rule is the interesting one. It implies that an extra column is needed to fit the requirement. Where should it go? We can see that this property depends on Route AND Station. Therefore it should be located in routeStations.
I would add a column to table routeStations called "stationOrder".
Then querying becomes easy:
Notes:
To develop on note 3, I've built the use case:
This is Oracle 12c Enterprise.
Note that in the execution plan below that table routes isn't used at all. the Cost Base Optimizer (CBO) knows it can get the routeId directly from routeStations's primary key (step 5, INDEX RANGE SCAN on ROUTESTATIONS_PK, Predicate Information 5 - access("RS"."ROUTEID"=1))
Now the fun part, let's add a column name to the route table. Now there's a column we actually need in "routes". The CBO uses the index to find the rowID for route 1, then accesses the table (table access by index rowid) and grabs the column "routes.name".