SQL Pair Query

query

I have only one table which contains 3 columns for travel company. It shows buses that went from City A to City B etc. I would like to find how many times this route has been used. I can easily find one way from A to B but I want this program to sum automatically from B to A in the same row.

In this scenario, A to B is equivalent to B to A. The requirement is to obtain the COUNT of ((X to Y) + (Y to X)) for any two arbitrary (and distinct) points Xand Y).

Example Table

ID | FROM | TO 
1  |  A   | B
2  |  C   | D
3  |  B   | A
4  |  C   | A
5  |  D   | C

The answer should be

Route AB = 2
Route CD = 2
Route CA = 1

etc.

For anyone wanting to help, here is the data in script form for easy copy/paste:

CREATE TABLE Routes
( ID INT NOT NULL,
  ORIGIN VARCHAR(2) NOT NULL,
  DESTINATION VARCHAR(2) NOT NULL
);

 INSERT INTO Routes
   ( ID, ORIGIN, DESTINATION )
 VALUES 
   ( 1, 'A', 'B' ),
   ( 2, 'C', 'D' ),
   ( 3, 'B', 'A' ),
   ( 4, 'C', 'A' ),
   ( 5, 'D', 'C' ) ;

 SELECT ID, 
        ORIGIN,
        DESTINATION
 FROM Routes;

 DROP TABLE Routes;

Best Answer

3 solutions (2 of which are similar to @stickybit's, but easier on the eye) are below.

I often find it beneficial to look at answers/threads which have multiple solutions to the problem - some of which are obviously better than others but it can be a learning experience!

The simplest and by far the most elegant solution is (thanks to the hint from @ypercube(tm)) is:

SELECT 
  LEAST(origin, destination) AS point_1,
  GREATEST(origin, destination) AS point_2,
  COUNT(*) AS journey_count
FROM route
GROUP BY point_1, point_2
ORDER BY point_1, point_2;

Result (same for all solutions):

point_1, point_2, journey_count
      A        B              2
      A        C              1
      C        D              2

The fiddle for this is here. All the examples here use PostgreSQL 10, but any mainstream RDBMS should work(*) - maybe with some tweaks!

(*)
SQLite/SQL Server don't have the LEAST() or GREATEST() functions.
Be careful with cases of identifiers for some systems
The fiddles can be buggy for some servers!

The next fiddle here uses PostgreSQL 10 (for MySQL, version >= 8.0 is required for the CTE). Running this fiddle on MySQL will give extra data because of the CHECK CONSTRAINT I put in, see below. Incredibly, MySQL still doesn't have them! MariaDB does implement CHECKs.

SELECT point_1, point_2, count(*)
FROM
(
  SELECT 
    CASE 
      WHEN origin < destination THEN origin ELSE destination
    END AS point_1,
    CASE
      WHEN destination > origin THEN destination ELSE origin
    END as point_2
  FROM
    routes
) AS tab
GROUP BY point_1, point_2
ORDER BY point_1, point_2;

This subquery eliminates the need for the repeated CASEstatement in @stickybit's solution.

Or, a CTE (Common Table Function - also available [here]https://dbfiddle.uk/?rdbms=postgres_10&fiddle=734ef45d84f5fb9cbba84cd1714318df)) can be used to the same end. For longer, more complex queries, this might be the way to go - CTE's are a godsend!

WITH the_route AS
(
  SELECT 
    CASE 
      WHEN origin < destination THEN origin ELSE destination
    END AS point_1,
    CASE
      WHEN destination > origin THEN destination ELSE origin
    END as point_2
  FROM
    routes
)
SELECT point_1, point_2, COUNT(*) 
FROM 
  the_route
GROUP BY point_1, point_2
ORDER BY point_1, point_2;

As a final point (pardon the pun!), you might want to add a CHECK CONSTRAINT to your table definition by ensuring that origin and destination are never the same as follows:

CREATE TABLE Routes
( 
  route_id    INTEGER NOT NULL,
  origin      VARCHAR(2) NOT NULL,
  destination VARCHAR(2) NOT NULL, 
  -- CHECK (destination != origin) - can do it this way (remove -- comment)
  CONSTRAINT routes_orig_dest_distinct_ck CHECK (destination != origin)
  -- Better as it gives a meaningful name to the CONSTRAINT
  -- You can check this by swapping the CONSTRAINTs
);

Related Solutions

Sql-server – Date alignment and pair matching extraction best done with TSQL or C#

Here's something I was fiddling with based on some work I had laying around. It doesn't handle the events clustered at a single time well. It could theoretically be helpful anyhow...:)

;WITH ordered_rows AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Identifier ORDER BY EventTime) AS Row,
        Identifier, Type, EventTime, DiscoveredDate, FileId FROM #EventDataTemp
)
,filtered_rows AS
(
    SELECT Row, Identifier, Type, EventTime, DiscoveredDate, FileId, 
        CAST(CASE Type WHEN 'Went Off' THEN 1 ELSE NULL END AS INT) 
            AS OffEventRow
    FROM ordered_rows
    WHERE Row = 1
    UNION ALL
    SELECT o.Row, o.Identifier, o.Type, o.EventTime, o.DiscoveredDate, o.FileId,
        CAST(CASE WHEN o.Type = 'Went Off' AND f.Type = 'Went Off' 
            OR o.Type = 'Came On' THEN f.OffEventRow ELSE o.Row END AS INT)
    FROM ordered_rows o INNER JOIN filtered_rows f 
        ON o.Row = f.Row + 1 AND o.Identifier = f.Identifier
)
,on_events AS
(
   SELECT Identifier, OffEventRow, MAX(Row) AS OnRow
   FROM filtered_rows
   WHERE Type = 'Came On' AND OffEventRow IS NOT NULL
   GROUP BY OffEventRow, Identifier
)
SELECT f.Identifier, f.Type, f.EventTime, f.DiscoveredDate, f.FileId 
FROM filtered_rows f LEFT JOIN on_events o 
    ON f.Identifier = o.Identifier 
    AND f.Row = o.OnRow
WHERE f.Type = 'Went Off' AND f.Row = f.OffEventRow
OR f.Type = 'Came On' AND o.OnRow IS NOT NULL
ORDER BY f.Identifier, f.EventTime

SQL query question

What's the db server you're working with? In SQL Server there's the MONTH function.

Your query could be something like:

Select *
from User u
where month(u.DOB) = 8

Best Answer

Related Solutions

Sql-server – Date alignment and pair matching extraction best done with TSQL or C#

SQL query question

Related Question