Storing bus routes in a database

database-design

I have done some research and found that I should store a route as a sequence of stops. Something like:

Start -> Stop A -> Stop B -> Stop C -> End

I have created three tables:

  • Routes
  • Stops
  • RouteStops

…where RouteStops is a junction table.

I have something like:

Routes

+---------+
| routeId |
+---------+
|    1    |
+---------+
|    2    |
+---------+

Stations

+-----------+------+
| stationId | Name |
+-----------+------+
|     1     |   A  |
+-----------+------+
|     2     |   B  |
+-----------+------+
|     3     |   C  |
+-----------+------+
|     4     |   D  |
+-----------+------+

RouteStations

+-------------+---------------+
| routeId(fk) | stationId(fk) |
+-------------+---------------+
|     1       |       A       |
+-------------+---------------+
|     1       |       C       |
+-------------+---------------+
|     1       |       D       |
+-------------+---------------+
|     2       |       A       |
+-------------+---------------+
|     2       |       D       |
+-------------+---------------+

Route 1 goes through

Station A -> Station C -> Station D

Route 2 goes through

Station A -> Station D

Is this a good way to store routes?

According to Wikipedia:

[…] the database system does not guarantee any ordering of the rows unless an ORDER BY clause is specified […]

Can I rely on such a database schema or maybe this should be done differently?

This is actually my university project, so I'm just wondering if such schema can be considered as a correct one. For this case, I would probably store only several routes (approx 3-5) and stations (approx 10-15), each route will consist of about 5 stations. I would be also glad to hear how this should look like in case of real and big bus company.

Best Answer

For all business analysis leading to database architecture, I recommend writing rules:

  • A route has 2 or more stations
  • A Station can be used by many routes
  • Stations on a route come in a specific order

The 1st and 2nd rules as you noticed implies a many to many relationship so you concluded rightfully to create routeStations.

The 3rd rule is the interesting one. It implies that an extra column is needed to fit the requirement. Where should it go? We can see that this property depends on Route AND Station. Therefore it should be located in routeStations.

I would add a column to table routeStations called "stationOrder".

+-------------+---------------+---------------
| routeId(fk) | stationId(fk) | StationOrder |
+-------------+---------------+---------------
|     1       |       1       |       3      |
+-------------+---------------+---------------
|     1       |       3       |       1      |
+-------------+---------------+---------------
|     1       |       4       |       2      |
+-------------+---------------+---------------
|     2       |       1       |       1      |
+-------------+---------------+---------------
|     2       |       4       |       2      |
+-------------+---------------+---------------

Then querying becomes easy:

select rs.routeID,s.Name
from routeStations rs
join
Stations s
on rs.stationId=s.StationId
where rs.routeId=1
order by rs.StationOrder;

+-------------+---------------+
| routeId(fk) | stationId(fk) |
+-------------+---------------+
|     1       |       C       |
+-------------+---------------+
|     1       |       D       |
+-------------+---------------+
|     1       |       A       |
+-------------+---------------+

Notes:

  1. I fixed the StationId in RouteStations in my example. You are using the StationName as the Id.
  2. If you don't use a route name, then there's not even a need for routeId since you can get that from routeStations
  3. Even if you would link to the route table, your database optimizer would notice it doesn't need that extra link and simply remove the extra steps.

To develop on note 3, I've built the use case:

This is Oracle 12c Enterprise.

Note that in the execution plan below that table routes isn't used at all. the Cost Base Optimizer (CBO) knows it can get the routeId directly from routeStations's primary key (step 5, INDEX RANGE SCAN on ROUTESTATIONS_PK, Predicate Information 5 - access("RS"."ROUTEID"=1))

--Table ROUTES
create sequence routeId_Seq start with 1 increment by 1 maxvalue 9999999999999 cache 1000;

CREATE TABLE routes
(
  routeId  INTEGER NOT NULL
);


ALTER TABLE routes ADD (
  CONSTRAINT routes_PK
  PRIMARY KEY
  (routeId)
  ENABLE VALIDATE);

insert into routes values (routeId_Seq.nextval);
insert into routes values (routeId_Seq.nextval);
commit;

--TABLE STATIONS  
create sequence stationId_seq start with 1 increment by 1 maxvalue 9999999999999 cache 1000;

create table stations(
   stationID INTEGER NOT NULL,
   name varchar(50) NOT NULL
);

ALTER TABLE stations ADD (
  CONSTRAINT stations_PK
  PRIMARY KEY
  (stationId)
  ENABLE VALIDATE);

insert into stations values (stationId_seq.nextval,'A');
insert into stations values (stationId_seq.nextval,'B');
insert into stations values (stationId_seq.nextval,'C');
insert into stations values (stationId_seq.nextval,'D');
commit;
--

--Table ROUTESTATIONS 
CREATE TABLE routeStations
(
  routeId       INTEGER NOT NULL,
  stationId     INTEGER NOT NULL,
  stationOrder  INTEGER NOT NULL
);


ALTER TABLE routeStations ADD (
  CONSTRAINT routeStations_PK
  PRIMARY KEY
  (routeId, stationId)
  ENABLE VALIDATE);

ALTER TABLE routeStations ADD (
  FOREIGN KEY (routeId) 
  REFERENCES ROUTES (ROUTEID)
  ENABLE VALIDATE,
  FOREIGN KEY (stationId) 
  REFERENCES STATIONS (stationId)
  ENABLE VALIDATE);

insert into routeStations values (1,1,3);
insert into routeStations values (1,3,1);
insert into routeStations values (1,4,2);
insert into routeStations values (2,1,1);
insert into routeStations values (2,4,2);
commit;

explain plan for select rs.routeID,s.Name
from ndefontenay.routeStations rs
join
ndefontenay.routes r
on r.routeId=rs.routeId
join ndefontenay.stations s
on rs.stationId=s.stationId
where rs.routeId=1
order by rs.StationOrder;

set linesize 1000
set pages 500
select * from table (dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2617709240                                                                                                                                                                                                                                                                                 

---------------------------------------------------------------------------------------------------                                                                                                                                                                                                         
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                         
---------------------------------------------------------------------------------------------------                                                                                                                                                                                                         
|   0 | SELECT STATEMENT               |                  |     1 |    79 |     1 (100)| 00:00:01 |                                                                                                                                                                                                         
|   1 |  SORT ORDER BY                 |                  |     1 |    79 |     1 (100)| 00:00:01 |                                                                                                                                                                                                         
|   2 |   NESTED LOOPS                 |                  |       |       |            |          |                                                                                                                                                                                                         
|   3 |    NESTED LOOPS                |                  |     1 |    79 |     0   (0)| 00:00:01 |                                                                                                                                                                                                         
|   4 |     TABLE ACCESS BY INDEX ROWID| ROUTESTATIONS    |     1 |    39 |     0   (0)| 00:00:01 |                                                                                                                                                                                                         
|*  5 |      INDEX RANGE SCAN          | ROUTESTATIONS_PK |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                         
|*  6 |     INDEX UNIQUE SCAN          | STATIONS_PK      |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                         
|   7 |    TABLE ACCESS BY INDEX ROWID | STATIONS         |     1 |    40 |     0   (0)| 00:00:01 |                                                                                                                                                                                                         
---------------------------------------------------------------------------------------------------                                                                                                                                                                                                         

Predicate Information (identified by operation id):                                                                                                                                                                                                                                                         
---------------------------------------------------                                                                                                                                                                                                                                                         

   5 - access("RS"."ROUTEID"=1)                                                                                                                                                                                                                                                                             
   6 - access("RS"."STATIONID"="S"."STATIONID")

Now the fun part, let's add a column name to the route table. Now there's a column we actually need in "routes". The CBO uses the index to find the rowID for route 1, then accesses the table (table access by index rowid) and grabs the column "routes.name".

ALTER TABLE ROUTES
 ADD (name  VARCHAR2(50));

update routes set name='Old Town' where routeId=1;
update routes set name='North County' where routeId=2;
commit;

explain plan for select r.name as routeName,s.Name as stationName
from routeStations rs
join
routes r
on r.routeId=rs.routeId
join stations s
on rs.stationId=s.stationId
where rs.routeId=1
order by rs.StationOrder;

set linesize 500
set pages 500
select * from table (dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                                                                                                                                                                                                                                           
---------------------------------------------------------------------------------------------------
Plan hash value: 3368128430                                                                                                                                                                                                                                                                                 

----------------------------------------------------------------------------------------------------                                                                                                                                                                                                        
| Id  | Operation                       | Name             | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                        
----------------------------------------------------------------------------------------------------                                                                                                                                                                                                        
|   0 | SELECT STATEMENT                |                  |     1 |   119 |     1 (100)| 00:00:01 |                                                                                                                                                                                                        
|   1 |  SORT ORDER BY                  |                  |     1 |   119 |     1 (100)| 00:00:01 |                                                                                                                                                                                                        
|   2 |   NESTED LOOPS                  |                  |       |       |            |          |                                                                                                                                                                                                        
|   3 |    NESTED LOOPS                 |                  |     1 |   119 |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|   4 |     NESTED LOOPS                |                  |     1 |    79 |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|   5 |      TABLE ACCESS BY INDEX ROWID| ROUTES           |     1 |    40 |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|*  6 |       INDEX UNIQUE SCAN         | ROUTES_PK        |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|   7 |      TABLE ACCESS BY INDEX ROWID| ROUTESTATIONS    |     1 |    39 |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|*  8 |       INDEX RANGE SCAN          | ROUTESTATIONS_PK |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|*  9 |     INDEX UNIQUE SCAN           | STATIONS_PK      |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
|  10 |    TABLE ACCESS BY INDEX ROWID  | STATIONS         |     1 |    40 |     0   (0)| 00:00:01 |                                                                                                                                                                                                        
----------------------------------------------------------------------------------------------------                                                                                                                                                                                                        

Predicate Information (identified by operation id):                                                                                                                                                                                                                                                         
---------------------------------------------------                                                                                                                                                                                                                                                         

   6 - access("R"."ROUTEID"=1)                                                                                                                                                                                                                                                                              
   8 - access("RS"."ROUTEID"=1)                                                                                                                                                                                                                                                                             
   9 - access("RS"."STATIONID"="S"."STATIONID")