SQL Query – Compute Longest Travel Time for Each ID Number

MySQLmysql-5.5sqlite

I am a novice in SQL and I have a problem as follows.

There are two tables

TRNSCTN_TAB:

CREATE TABLE TRNSCTN_TAB(
    `transactionId` int
    , `UniqueId` int
    , `transactionTimeStamp` varchar(17)
    , `transactionAmt` int
    , `personalId` int
    , `merchantId` int
);

Note: UniqueId, personalId, merchantId are not necessarily unique for all the records.

Data sample:

transactionId   | UniqueId  | transactionTimeStamp  | transactionAmt    | personalId    | merchantId
1               | 2340      | 20150123 10:00:00     | 300.0             | 100           | 301
2               | 2340      | 20150124 09:00:00     | 30.0              | 100           | 302
3               | 2340      | 20150124 14:30:00     | 301.0             | 100           | 303
4               | 2340      | 20150125 12:00:00     | 31.0              | 100           | 302
5               | 2341      | 20150125 14:30:00     | 200               | 101           | 303
6               | 2341      | 20150125 23:00:00     | 33                | 101           | 304
7               | 2341      | 20150125 23:30:00     | 23                | 100           | 303
8               | 2341      | 20150126 06:00:00     | 300               | 101           | 303
9               | 2341      | 20150126 06:30:00     | 300               | 101           | 303
10              | 2342      | 20150126 11:00:00     | 30                | 102           | 304
11              | 2342      | 20150126 16:00:00     | 310               | 102           | 304
12              | 2342      | 20150127 06:00:00     | 120               | 102           | 305
13              | 2342      | 20150127 16:00:00     | 300               | 102           | 304
14              | 2342      | 20150128 03:00:00     | 30                | 102           | 305

ATTRXN_TAB:

CREATE TABLE ATTRXN_TAB(
    `perosnalId` int
    , `merchantId` int
);

Data sample:

personalId  | merchantId
100         | 301
100         | 302
100         | 303
101         | 303
101         | 304
102         | 304
102         | 305

Sample date in SQL Fiddle http://sqlfiddle.com/#!2/64ae3.

Re-defined the problem: Let's say the selected attraction set is
[(100,301) and (100,302)]. Two travels with these sets:

  1. transaction-1 and transaction-2 with the travel time of 23 hrs
  2. transaction-4 with a travel time of 0 hours

From the above, longest travel time for this uniqueID (2340) within 24 hours is 23 hours.

Reasons:

  1. Transaction 1 and 2 belong to selected merchants though they have different merchantId
  2. Transaction 2 and 4 belong to the selected attrxn_set [(100,301) and (100,302)]. However since the time difference is more than 24 hours, they cannot be considered the same trip.
  3. Transaction 2 and 3 the difference is less than 24 hours but the transaction-3 does not belong to the attrxn_set of interest [(100, 301) and (100, 302)].
  4. The same is true for difference between transaction-3 and transaction-4.

The task is to write a query to find the longest travel time for each uniqueId between two transactions.

Note: If the difference between two transactions is less than 24 hours then it is in the same trip.

So I want to join the two tables and find the longest distance travelled time for each uniqueId. Could someone help me with this exercise?

Expected result for attraction set [(100,301) and (100,302)]:

unqieID     | LongestTravelTime     | attractionSet
2340        | 23:00:00              | (100,302),(100,301)
2341        | 00:00:00              | (100,302),(100,301)
2342        | 00:00:00              | (100,302),(100,301)

SQL Fiddle

SQLite (SQL.js) Schema Setup:

CREATE TABLE TRNSCTN_TAB(
    `transactionId` int
    , `UniqueId` int
    , `transactionTimeStamp` varchar(17)
    , `transactionAmt` int
    , `personalId` int
    , `merchantId` int
);

Query 1:

Guys I've made some initial attempts to query and return distinct UniqueId with unique peronalId with the following query:

select distinct TRNSCTN_TAB.UniqueId, TRNSCTN_TAB.personalId, TRNSCTN_TAB.merchantId, TRNSCTN_TAB.transactionTimeStamp from TRNSCTN_TAB
inner join ( select distinct ATTRXN_TAB.perosnalId from ATTRXN_TAB)

Now I want to compute the difference in time between each of the transactions for each distinct uniqueId and evaluate the within 24 hours rule and max duration among all the durations. This would be trivial for you experts I believe.

I am populating the sample data in SQLite and trying to come up with the query. The production data is in MySQL, which I do not have access to.

Best Answer

As far as I understood the question with what we have so far, I think this first query returns the correct date within 24h.

Query:

SELECT UniqueId, transactionTimeStamp
  , DATE_ADD(transactionTimeStamp, INTERVAL -24 hour)
    , (
        SELECT max(transactionTimeStamp) as transactionTimeStamp 
        FROM TRNSCTN_TAB s
        WHERE s.personalId IN (100) AND s.merchantId IN (301, 302)
            AND UniqueId = s.UniqueId
            AND transactionTimeStamp > tt.transactionTimeStamp 
            AND transactionTimeStamp <= DATE_ADD(transactionTimeStamp, INTERVAL 24 hour)
    ) mx
FROM TRNSCTN_TAB tt
INNER JOIN ATTRXN_TAB at 
    ON at.personalId = tt.personalId AND at.merchantId = tt.merchantId
WHERE tt.personalId IN (100) AND tt.merchantId IN (301, 302)
; 

It returns the MAX(transactionTimeStamp) for each row with a set of personalId / merchantId using a correlated subquery. You can look for SQL correlated subquery on Google or another search engine.

Output:

UniqueId    | transactionTimeStamp      | mxd
2340        | 2015-01-23 10:00:00.000   | 2015-01-24 09:00:00.000
2340        | 2015-01-24 09:00:00.000   | NULL
2340        | 2015-01-25 12:00:00.000   | NULL

Then it only has to GROUP BY UniqueId and return the MAX(TIMEDIFF(...)):

SELECT UniqueId
    ,  DATE_FORMAT(MAX(TIMEDIFF(mxd, transactionTimeStamp)),'%H:%i:%s')
FROM 
    SELECT UniqueId, transactionTimeStamp 
        , (
            SELECT max(transactionTimeStamp) as transactionTimeStamp 
            FROM TRNSCTN_TAB s
            WHERE s.personalId IN (100) AND s.merchantId IN (301, 302)
                AND UniqueId = s.UniqueId
                AND transactionTimeStamp > tt.transactionTimeStamp 
                AND transactionTimeStamp <= DATE_ADD(tt.transactionTimeStamp, INTERVAL 24 hour)
        ) mxd
    FROM TRNSCTN_TAB tt
    INNER JOIN ATTRXN_TAB at 
        ON at.personalId = tt.personalId AND at.merchantId = tt.merchantId
    WHERE tt.personalId IN (100) AND tt.merchantId IN (301, 302)
) mx
GROUP BY UniqueId
; 

It will only return a rows for each UniqueId if there is a valid period. Otherwise it return no rows:

UniqueId    | 
2340        | 23:00:00.0000000

SQL Fiddle with both queries.