Access SQL DateTime Stamps – Flanking DateTime Stamps in TableA and TableB

aggregatedatetimejoin;ms access

First Words

You can safely ignore the sections below (and including) JOINs: Starting Off if you just want to take a crack of the code. The background and results just serve as context. Please look at the edit history before 2015-10-06 if you want to see what the code looked like initially.


Objective

Ultimately I want to calculate interpolated GPS coordinates for the transmitter (X or Xmit) based on the DateTime stamps of available GPS data in table SecondTable that directly flank the observation in table FirstTable.

My immediate objective to accomplish the ultimate objective is to figure out how to best join FirstTable to SecondTable to get those flanking time points. Later I can use that information I can calculate intermediate GPS coordinates assuming linear fitting along an equirectangular coordinate system (fancy words to say I don't care that the Earth is a sphere at this scale).


Questions

  1. Is there a more efficient way to generate the closest
    before-and-after time stamps?

    • Fixed by myself by just grabbing
      the "after," and then getting the "before" only as it related to the
      "after".
  2. Is there a more intuitive way that doesn't involve the (A<>B OR A=B) structure.
    • Byrdzeye provided the basic alternatives,
      however my "real-world" experience didn't line up with all 4 of his
      join strategies performing the same. But full credit to him for
      addressing the alternative join styles.
  3. Any other thoughts, tricks and advice you may have.
    • Thusfar both byrdzeye and Phrancis have been quite helpful in this regard. I
      found that Phrancis' advice was excellently laid out and provided
      help at a critical stage, so I'll give him the edge here.

I still would appreciate any additional help that I can receive with regard to question 3.
Bulletpoints reflect who I believe helped me most on the individual question.


Table Definitions

Semi-visual representation

FirstTable

Fields
  RecTStamp | DateTime  --can contain milliseconds via VBA code (see Ref 1) 
  ReceivID  | LONG
  XmitID    | TEXT(25)
Keys and Indices
  PK_DT     | Primary, Unique, No Null, Compound
    XmitID    | ASC
    RecTStamp | ASC
    ReceivID  | ASC
  UK_DRX    | Unique, No Null, Compound
    RecTStamp | ASC
    ReceivID  | ASC
    XmitID    | ASC

SecondTable

Fields
  X_ID      | LONG AUTONUMBER -- seeded after main table has been created and already sorted on the primary key
  XTStamp   | DateTime --will not contain partial seconds
  Latitude  | Double   --these are in decimal degrees, not degrees/minutes/seconds
  Longitude | Double   --this way straight decimal math can be performed
Keys and Indices
  PK_D      | Primary, Unique, No Null, Simple
    XTStamp   | ASC
  UIDX_ID   | Unique, No Null, Simple
    X_ID      | ASC

ReceiverDetails table

Fields
  ReceivID                      | LONG
  Receiver_Location_Description | TEXT -- NULL OK
  Beginning                     | DateTime --no partial seconds
  Ending                        | DateTime --no partial seconds
  Lat                           | DOUBLE
  Lon                           | DOUBLE
Keys and Indicies
  PK_RID  | Primary, Unique, No Null, Simple
    ReceivID | ASC

ValidXmitters table

Field (and primary key)
  XmitID    | TEXT(25) -- primary, unique, no null, simple

SQL fiddle…

…so that you can play with the table definitions and code
This question is for MSAccess, but as Phrancis pointed out, there is no SQL fiddle style for Access. So, you should be able to go here to see my table definitions and code based on Phrancis' answer:
http://sqlfiddle.com/#!6/e9942/4 (external link)


JOINs: Starting off

My current "inner guts" JOIN Strategy

First create a FirstTable_rekeyed with column order and compound primary key (RecTStamp, ReceivID, XmitID) all indexed/sorted ASC. I also created indexes on each column individually. Then fill it like so.

INSERT INTO FirstTable_rekeyed (RecTStamp, ReceivID, XmitID)
  SELECT DISTINCT ROW RecTStamp, ReceivID, XmitID
  FROM FirstTable
  WHERE XmitID IN (SELECT XmitID from ValidXmitters)
  ORDER BY RecTStamp, ReceivID, XmitID;

The above query fills the new table with 153006 records and returns within a matter of 10 seconds or so.

The following completes within a second or two when this whole method is wrapped in a "SELECT Count(*) FROM ( … )" when the TOP 1 subquery method is used

SELECT 
    ReceiverRecord.RecTStamp, 
    ReceiverRecord.ReceivID, 
    ReceiverRecord.XmitID,
    (SELECT TOP 1 XmitGPS.X_ID FROM SecondTable as XmitGPS WHERE ReceiverRecord.RecTStamp < XmitGPS.XTStamp ORDER BY XmitGPS.X_ID) AS AfterXmit_ID
    FROM FirstTable_rekeyed AS ReceiverRecord
    -- INNER JOIN SecondTable AS XmitGPS ON (ReceiverRecord.RecTStamp < XmitGPS.XTStamp)
         GROUP BY RecTStamp, ReceivID, XmitID;
-- No separate join needed for the Top 1 method, but it would be required for the other methods. 
-- Additionally no restriction of the returned set is needed if I create the _rekeyed table.
-- May not need GROUP BY either. Could try ORDER BY.
-- The three AfterXmit_ID alternatives below take longer than 3 minutes to complete (or do not ever complete).
  -- FIRST(XmitGPS.X_ID)
  -- MIN(XmitGPS.X_ID)
  -- MIN(SWITCH(XmitGPS.XTStamp > ReceiverRecord.RecTStamp, XmitGPS.X_ID, Null))

Previous "inner guts" JOIN query

First (fastish…but not good enough)

SELECT 
  A.RecTStamp,
  A.ReceivID,
  A.XmitID,
  MAX(IIF(B.XTStamp<= A.RecTStamp,B.XTStamp,Null)) as BeforeXTStamp,
  MIN(IIF(B.XTStamp > A.RecTStamp,B.XTStamp,Null)) as AfterXTStamp
FROM FirstTable as A
INNER JOIN SecondTable as B ON 
  (A.RecTStamp<>B.XTStamp OR A.RecTStamp=B.XTStamp)
GROUP BY A.RecTStamp, A.ReceivID, A.XmitID
  -- alternative for BeforeXTStamp MAX(-(B.XTStamp<=A.RecTStamp)*B.XTStamp)
  -- alternatives for AfterXTStamp (see "Aside" note below)
  -- 1.0/(MAX(1.0/(-(B.XTStamp>A.RecTStamp)*B.XTStamp)))
  -- -1.0/(MIN(1.0/((B.XTStamp>A.RecTStamp)*B.XTStamp)))

Second (slower)

SELECT
  A.RecTStamp, AbyB1.XTStamp AS BeforeXTStamp, AbyB2.XTStamp AS AfterXTStamp
FROM (FirstTable AS A INNER JOIN 
  (select top 1 B1.XTStamp, A1.RecTStamp 
   from SecondTable as B1, FirstTable as A1
   where B1.XTStamp<=A1.RecTStamp
   order by B1.XTStamp DESC) AS AbyB1 --MAX (time points before)
ON A.RecTStamp = AbyB1.RecTStamp) INNER JOIN 
  (select top 1 B2.XTStamp, A2.RecTStamp 
   from SecondTable as B2, FirstTable as A2
   where B2.XTStamp>A2.RecTStamp
   order by B2.XTStamp ASC) AS AbyB2 --MIN (time points after)
ON A.RecTStamp = AbyB2.RecTStamp; 

Background

I have a telemetry table (aliased as A) of just under 1 million entries with a compound primary key based on a DateTime stamp, a Transmitter ID and a Recording Device ID. Due to circumstances beyond my control, my SQL language is the standard Jet DB in Microsoft Access (users will use 2007 and later versions). Only about 200,000 of these entries are relevant to the query because of the Transmitter ID.

There is a second telemetry table (alias B) that involves approximately 50,000 entries with a single DateTime primary key

For the first step, I focused on finding the closest timestamps to the stamps in the first table from the second table.


JOIN Results

Quirks that I've Discovered…

…along the way during debugging

It feels really odd to be writing the JOIN logic as FROM FirstTable as A INNER JOIN SecondTable as B ON (A.RecTStamp<>B.XTStamp OR A.RecTStamp=B.XTStamp) which as @byrdzeye pointed out in a comment (that has since disappeared) is a form of cross-join. Note that substituting LEFT OUTER JOIN for INNER JOIN in the code above appears to make no impact in the quantity or identity of the lines returned. I also can't seem to leave off the ON clause or say ON (1=1). Just using a comma to join (rather than INNER or LEFT OUTER JOIN) results in Count(select * from A) * Count(select * from B) rows returned in this query, rather than just one line per table A, as the (A<>B OR A=B) explicit JOIN returns. This is clearly not suitable. FIRST doesn't seem to be available to use given a compound primary key type.

The Second JOIN style, although arguably more legible, suffers from being slower. This may be because an additional two inner JOINs are required against the larger table as well as the two CROSS JOINs found in both options.

Aside: Replacing the IIF clause with MIN/MAX appears to return the same number of entries.
MAX(-(B.XTStamp<=A.RecTStamp)*B.XTStamp)
works for the "Before" (MAX) timestamp, but doesn't work directly for the "After" (MIN) as follows:
MIN(-(B.XTStamp>A.RecTStamp)*B.XTStamp)
because the minimum is always 0 for the FALSE condition. This 0 is less than any post-epoch DOUBLE (which a DateTime field is a subset of in Access and that this calculation transforms the field into). The IIF and MIN/MAX methods The alternates proposed for the AfterXTStamp value work because division by zero (FALSE) generates null values, which the aggregate functions MIN and MAX skip over.

Next Steps

Taking this further, I wish to find the timestamps in the second table that directly flank the timestamps in the first table and perform a linear interpolation of the data values from the second table based on the time distance to those points (i.e. if the timestamp from the first table is 25% of the way between the "before" and "after", I would like 25% of the calculated value to come from 2nd table value data associated with the "after" point and 75% from the "before"). Using the revised join type as part of the inner guts, and after the suggested answers below I produce…

    SELECT
        AvgGPS.XmitID,
        StrDateIso8601Msec(AvgGPS.RecTStamp) AS RecTStamp_ms,
        -- StrDateIso8601MSec is a VBA function returning a TEXT string in yyyy-mm-dd hh:nn:ss.lll format
        AvgGPS.ReceivID,
        RD.Receiver_Location_Description,
        RD.Lat AS Receiver_Lat,
        RD.Lon AS Receiver_Lon,
        AvgGPS.Before_Lat * (1 - AvgGPS.AfterWeight) + AvgGPS.After_Lat * AvgGPS.AfterWeight AS Xmit_Lat,
        AvgGPS.Before_Lon * (1 - AvgGPS.AfterWeight) + AvgGPS.After_Lon * AvgGPS.AfterWeight AS Xmit_Lon,
        AvgGPS.RecTStamp AS RecTStamp_basic
    FROM ( SELECT 
        AfterTimestampID.RecTStamp,
        AfterTimestampID.XmitID,
        AfterTimestampID.ReceivID,
        GPSBefore.BeforeXTStamp, 
        GPSBefore.Latitude AS Before_Lat, 
        GPSBefore.Longitude AS Before_Lon,
        GPSAfter.AfterXTStamp, 
        GPSAfter.Latitude AS After_Lat, 
        GPSAfter.Longitude AS After_Lon,
        ( (AfterTimestampID.RecTStamp - GPSBefore.XTStamp) / (GPSAfter.XTStamp - GPSBefore.XTStamp) ) AS AfterWeight
        FROM (
            (SELECT 
                ReceiverRecord.RecTStamp, 
                ReceiverRecord.ReceivID, 
                ReceiverRecord.XmitID,
               (SELECT TOP 1 XmitGPS.X_ID FROM SecondTable as XmitGPS WHERE ReceiverRecord.RecTStamp < XmitGPS.XTStamp ORDER BY XmitGPS.X_ID) AS AfterXmit_ID
             FROM FirstTable AS ReceiverRecord 
             -- WHERE ReceiverRecord.XmitID IN (select XmitID from ValidXmitters)
             GROUP BY RecTStamp, ReceivID, XmitID
            ) AS AfterTimestampID INNER JOIN SecondTable AS GPSAfter ON AfterTimestampID.AfterXmit_ID = GPSAfter.X_ID
        ) INNER JOIN SecondTable AS GPSBefore ON AfterTimestampID.AfterXmit_ID = GPSBefore.X_ID + 1
    ) AS AvgGPS INNER JOIN ReceiverDetails AS RD ON (AvgGPS.ReceivID = RD.ReceivID) AND (AvgGPS.RecTStamp BETWEEN RD.Beginning AND RD.Ending)
    ORDER BY AvgGPS.RecTStamp, AvgGPS.ReceivID;

…which returns 152928 records, conforming (at least approximately) to the final number of expected records. Run time is probably 5-10 minutes on my i7-4790, 16GB RAM, no SSD, Win 8.1 Pro system.


Reference 1: MS Access Can Handle Millisecond Time Values–Really and accompanying source file [08080011.txt]

Best Answer

I must first compliment you on your courage to do something like this with an Access DB, which from my experience is very difficult to do anything SQL-like. Anyways, on to the review.


First join

Your IIF field selections might benefit from using a Switch statement instead. It seems to be sometimes the case, especially with things SQL, that a SWITCH (more commonly known as CASE in typical SQL) is quite fast when just making simple comparisons in the body of a SELECT. The syntax in your case would be almost identical, although a switch can be expanded to cover a large chunk of comparisons in one field. Something to consider.

  SWITCH (
    expr1, val1,
    expr2, val2,
    val3        -- default value or "else"
  )

A switch can also help readability, in larger statements. In context:

  MAX(SWITCH(B.XTStamp <= A.RecTStamp,B.XTStamp,Null)) as BeforeXTStamp,
  --alternatively MAX(-(B.XTStamp<=A.RecTStamp)*B.XTStamp) as BeforeXTStamp,
  MIN(SWITCH(B.XTStamp>A.RecTStamp,B.XTStamp,Null)) as AfterXTStamp

As for the join itself, I think (A.RecTStamp<>B.XTStamp OR A.RecTStamp=B.XTStamp) is about as good as you're going to get, given what you are trying to do. It's not that fast, but I wouldn't expect it to be either.


Second join

You said this is slower. It's also less readable from a code standpoint. Given equally satisfactory result sets between 1 and 2, I'd say go for 1. At least it's obvious what you are trying to do that way. Subqueries are often not very fast (though often unavoidable) especially in this case you are throwing in an extra join in each, which must certainly complicate the execution plan.

One remark, I saw that you used old ANSI-89 join syntax. It's best to avoid that, the performance will be same or better with the more modern join syntax, and they are less ambiguous or easier to read, harder to make mistakes.

FROM (FirstTable AS A INNER JOIN 
  (select top 1 B1.XTStamp, A1.RecTStamp 
   from SecondTable as B1
   inner join FirstTable as A1
     on B1.XTStamp <= A1.RecTStamp
   order by B1.XTStamp DESC) AS AbyB1 --MAX (time points before)

Naming things

I think the way your things are named is unhelpful at best, and cryptic at worst. A, B, A1, B1 etc. as table aliases I think could be better. Also, I think the field names are not very good, but I realize you may not have control over this. I will just quickly quote The Codeless Code on the topic of naming things, and leave it at that...

“Invective!” answered the priestess. “Verb your expletive nouns!”


"Next steps" query

I couldn't make much sense of it how it was written, I had to take it to a text editor and do some style changes to make it more readable. I know Access' SQL editor is beyond clunky, so I usually write my queries in a good editor like Notepad++ or Sublime Text. Some of the stylistic changes I applied to make it more readable:

  • 4 spaces indent instead of 2 spaces
  • Spaces around mathematical and comparison operators
  • More natural placing of braces and indentation (I went with Java-style braces, but could also be C-style, at your preference)

So as it turns out, this is a very complicated query indeed. To make sense of it, I have to start from the innermost query, your ID data set, which I understand is the same as your First Join. It returns the IDs and timestamps of the devices where the before/after timestamps are the closest, within the subset of devices you are interested in. So instead of ID why not call it ClosestTimestampID.

Your Det join is used only once:

enter image description here

The rest of the time, it only joins the values you already have from ClosestTimestampID. So instead we should be able to just do this:

    ) AS ClosestTimestampID
    INNER JOIN SecondTable AS TL1 
        ON ClosestTimestampID.BeforeXTStamp = TL1.XTStamp) 
    INNER JOIN SecondTable AS TL2 
        ON ClosestTimestampID.AfterXTStamp = TL2.XTStamp
    WHERE ClosestTimestampID.XmitID IN (<limited subset S>)

Maybe not be a huge performance gain, but anything we can do to help the poor Jet DB optimizer will help!


I can't shake the feeling that the calculations/algorithm for BeforeWeight and AfterWeight which you use to interpolate could be done better, but unfortunately I'm not very good with those.

One suggestion to avoid crashing (although it's not ideal depending on your application) would be to break out your nested subqueries into tables of their own and update those when needed. I'm not sure how often you need your source data to be refreshed, but if it is not too often you might think of writing some VBA code to schedule an update of the tables and derived tables, and just leave your outermost query to pull from those tables instead of the original source. Just a thought, like I said not ideal but given the tool you may not have a choice.


Everything together:

SELECT
    InGPS.XmitID,
    StrDateIso8601Msec(InGPS.RecTStamp) AS RecTStamp_ms,
       -- StrDateIso8601MSec is a VBA function returning a TEXT string in yyyy-mm-dd hh:nn:ss.lll format
    InGPS.ReceivID,
    RD.Receiver_Location_Description,
    RD.Lat AS Receiver_Lat,
    RD.Lon AS Receiver_Lon,
    InGPS.Before_Lat * InGPS.BeforeWeight + InGPS.After_Lat * InGPS.AfterWeight AS Xmit_Lat,
    InGPS.Before_Lon * InGPS.BeforeWeight + InGPS.After_Lon * InGPS.AfterWeight AS Xmit_Lon,
    InGPS.RecTStamp AS RecTStamp_basic
FROM (
    SELECT 
        ClosestTimestampID.RecTStamp,
        ClosestTimestampID.XmitID,
        ClosestTimestampID.ReceivID,
        ClosestTimestampID.BeforeXTStamp, 
        TL1.Latitude AS Before_Lat, 
        TL1.Longitude AS Before_Lon,
        (1 - ((ClosestTimestampID.RecTStamp - ClosestTimestampID.BeforeXTStamp) 
            / (ClosestTimestampID.AfterXTStamp - ClosestTimestampID.BeforeXTStamp))) AS BeforeWeight,
        ClosestTimestampID.AfterXTStamp, 
        TL2.Latitude AS After_Lat, 
        TL2.Longitude AS After_Lon,
        (     (ClosestTimestampID.RecTStamp - ClosestTimestampID.BeforeXTStamp) 
            / (ClosestTimestampID.AfterXTStamp - ClosestTimestampID.BeforeXTStamp)) AS AfterWeight
        FROM (((
            SELECT 
                A.RecTStamp, 
                A.ReceivID, 
                A.XmitID,
                MAX(SWITCH(B.XTStamp <= A.RecTStamp, B.XTStamp, Null)) AS BeforeXTStamp,
                MIN(SWITCH(B.XTStamp > A.RecTStamp, B.XTStamp, Null)) AS AfterXTStamp
            FROM FirstTable AS A
            INNER JOIN SecondTable AS B 
                ON (A.RecTStamp <> B.XTStamp OR A.RecTStamp = B.XTStamp)
            WHERE A.XmitID IN (<limited subset S>)
            GROUP BY A.RecTStamp, ReceivID, XmitID
        ) AS ClosestTimestampID
        INNER JOIN FirstTable AS Det 
            ON (Det.XmitID = ClosestTimestampID.XmitID) 
            AND (Det.ReceivID = ClosestTimestampID.ReceivID) 
            AND (Det.RecTStamp = ClosestTimestampID.RecTStamp)) 
        INNER JOIN SecondTable AS TL1 
            ON ClosestTimestampID.BeforeXTStamp = TL1.XTStamp) 
        INNER JOIN SecondTable AS TL2 
            ON ClosestTimestampID.AfterXTStamp = TL2.XTStamp
        WHERE Det.XmitID IN (<limited subset S>)
    ) AS InGPS
INNER JOIN ReceiverDetails AS RD 
    ON (InGPS.ReceivID = RD.ReceivID) 
    AND (InGPS.RecTStamp BETWEEN <valid parameters from another table>)
ORDER BY StrDateIso8601Msec(InGPS.RecTStamp), InGPS.ReceivID;