Optimised Schema for City-to-City Distance Table

database-designforeign keyprimary-keyrdbmsschema

This is my first question, forgive me if this question is simple. I'm stuck here, trying to implement the relationships for the data represented below. I want to scale it vertically (new rows), instead of adding new columns as city names horizontally (new columns).

So far, I could only deduce two tables:

  1. City tables with primary key. (6 rows)
  2. Distance table with primary key. (13 rows)

How would I relate these two tables? Or should I proceed with one table as is in the representation below?

Table

Best Answer

Most database management systems have a hard limit on either the number of columns you're allowed to use, the number of bytes in a row, or both. So your single table won't work in the general case, because you'll eventually either end up with either too many columns or too many bytes in a row.

To find out what the limitations are for your platform, Google something like

  • SQL Server limitations
  • PostgreSQL limitations
  • Oracle limitations

All you need to store your data is a single table of distances. A table of cities can be used to validate the values in the table of distances, but it's not essential. (I'd use one, though.)

But this won't work in the general case either, because it's subject to the same limitations as a base table. To see how it works for relatively small data sets, look at this example. And look closely at the CHECK() constraints.

create table distances (
  city_from varchar(25) not null,
  city_to varchar(25) not null,
  check (city_from < city_to),
  distance integer not null check (distance >= 0),
  primary key (city_from, city_to)
);

insert into distances values
('Bangalore', 'Chennai', 100),
('Bangalore', 'Hyderabad', 200),
('Bangalore', 'Mumbai', 300),
('Bangalore', 'Delhi', 400),
('Bangalore', 'Kolkata', 500),
('Chennai', 'Hyderabad', 150),
('Chennai', 'Mumbai', 250),
('Chennai', 'Delhi', 450),
('Chennai', 'Kolkata', 550);

Now you can create a pivot table with a SELECT statement. (SQL Server. Other dbms have different ways of dealing with pivot tables.)

SELECT *
FROM (
  SELECT city_from, city_to, distance
  FROM distances
  union all 
  SELECT city_to, city_from, distance
  FROM distances
) AS t1
PIVOT
(
  max(distance)
  FOR [city_to]
  IN (
    [Bangalore], [Chennai], [Delhi], [Hyderabad], [Kolkata], [Mumbai]
  )
) AS t2
ORDER BY city_from;

Because of the limitations on the number of columns and the number of bytes per row, I think you're better off returning the base table (with or without the UNION ALL that you see in the query above) to the application, and letting the application format it for display. Application code doesn't generally have limits on the number of columns or the number of bytes.