SQL Pair Query

query

I have only one table which contains 3 columns for travel company. It shows buses that went from City A to City B etc. I would like to find how many times this route has been used. I can easily find one way from A to B but I want this program to sum automatically from B to A in the same row.

In this scenario, A to B is equivalent to B to A. The requirement is to obtain the COUNT of ((X to Y) + (Y to X)) for any two arbitrary (and distinct) points Xand Y).

Example Table

ID | FROM | TO 
1  |  A   | B
2  |  C   | D
3  |  B   | A
4  |  C   | A
5  |  D   | C

The answer should be

Route AB = 2
Route CD = 2
Route CA = 1

etc.

For anyone wanting to help, here is the data in script form for easy copy/paste:

CREATE TABLE Routes
( ID INT NOT NULL,
  ORIGIN VARCHAR(2) NOT NULL,
  DESTINATION VARCHAR(2) NOT NULL
);

 INSERT INTO Routes
   ( ID, ORIGIN, DESTINATION )
 VALUES 
   ( 1, 'A', 'B' ),
   ( 2, 'C', 'D' ),
   ( 3, 'B', 'A' ),
   ( 4, 'C', 'A' ),
   ( 5, 'D', 'C' ) ;

 SELECT ID, 
        ORIGIN,
        DESTINATION
 FROM Routes;

 DROP TABLE Routes;

Best Answer

3 solutions (2 of which are similar to @stickybit's, but easier on the eye) are below.

I often find it beneficial to look at answers/threads which have multiple solutions to the problem - some of which are obviously better than others but it can be a learning experience!

The simplest and by far the most elegant solution is (thanks to the hint from @ypercube(tm)) is:

SELECT 
  LEAST(origin, destination) AS point_1,
  GREATEST(origin, destination) AS point_2,
  COUNT(*) AS journey_count
FROM route
GROUP BY point_1, point_2
ORDER BY point_1, point_2;

Result (same for all solutions):

point_1, point_2, journey_count
      A        B              2
      A        C              1
      C        D              2

The fiddle for this is here. All the examples here use PostgreSQL 10, but any mainstream RDBMS should work(*) - maybe with some tweaks!

  • (*)
  • SQLite/SQL Server don't have the LEAST() or GREATEST() functions.
  • Be careful with cases of identifiers for some systems
  • The fiddles can be buggy for some servers!

The next fiddle here uses PostgreSQL 10 (for MySQL, version >= 8.0 is required for the CTE). Running this fiddle on MySQL will give extra data because of the CHECK CONSTRAINT I put in, see below. Incredibly, MySQL still doesn't have them! MariaDB does implement CHECKs.

SELECT point_1, point_2, count(*)
FROM
(
  SELECT 
    CASE 
      WHEN origin < destination THEN origin ELSE destination
    END AS point_1,
    CASE
      WHEN destination > origin THEN destination ELSE origin
    END as point_2
  FROM
    routes
) AS tab
GROUP BY point_1, point_2
ORDER BY point_1, point_2;

This subquery eliminates the need for the repeated CASEstatement in @stickybit's solution.

Or, a CTE (Common Table Function - also available [here]https://dbfiddle.uk/?rdbms=postgres_10&fiddle=734ef45d84f5fb9cbba84cd1714318df)) can be used to the same end. For longer, more complex queries, this might be the way to go - CTE's are a godsend!

WITH the_route AS
(
  SELECT 
    CASE 
      WHEN origin < destination THEN origin ELSE destination
    END AS point_1,
    CASE
      WHEN destination > origin THEN destination ELSE origin
    END as point_2
  FROM
    routes
)
SELECT point_1, point_2, COUNT(*) 
FROM 
  the_route
GROUP BY point_1, point_2
ORDER BY point_1, point_2;

As a final point (pardon the pun!), you might want to add a CHECK CONSTRAINT to your table definition by ensuring that origin and destination are never the same as follows:

CREATE TABLE Routes
( 
  route_id    INTEGER NOT NULL,
  origin      VARCHAR(2) NOT NULL,
  destination VARCHAR(2) NOT NULL, 
  -- CHECK (destination != origin) - can do it this way (remove -- comment)
  CONSTRAINT routes_orig_dest_distinct_ck CHECK (destination != origin)
  -- Better as it gives a meaningful name to the CONSTRAINT
  -- You can check this by swapping the CONSTRAINTs
);