Sql-server – How to most effectively get the first preceding/subsequent entry outside a given time interval

optimizationperformancequery-performancesql server

We have a SQL Server DB that contains a large table (>100M rows) with archive values, basically in the form of [Id], [StationId], [DatapointId], [Timestamp], [Value]. As the stations do not get polled, there aren't any regular intervals at which a station sends a datapoint, so the timestamps are set to arbitrary times.

For our web visualization, I need to extract entries from this table for a pannable chart that I'm building with D3.js. I need to query by StationId, DatapointId, StartDate and EndDate. This is simple and works as expected, but for the line chart that it feeds, I do need additional data, namely the first value before the chosen time interval, and (if there is one) the first value after the interval. This is neccessary to draw the line from the left resp. right boundary of the chart to the first/last value within the interval (otherwise the data within the chart would look somewhat "isolated" because the line stops before the charts boundaries, like there's no data before or after the values in the chosen interval).

At the moment I need to run 3 queries: one to get the data within the interval, one to retrieve the first matching row before StartDate, and anotherone to get the first matching row after EndDate.

It works, and after creating a covering index the speed is ok, but I think there might be a way to reduce the number of roundtrips to the DB. Do you have any idea?

Best Answer

I assume you query for all data (from all stations) in a specific time range.

You can probably write this with window functions (and it might be more efficient) but here is one method, with OUTER APPLY.

(The ds subquery is only used to get a list of distinct StationID. It should be replaced with the Stations table you likely have):

SELECT
    ds.[StationID],                   -- station
    st.ExtendedStartDate,             -- extended start and
    en.ExtendedEndDate                -- and end dates

    tt.[Timestamp],                   -- timestamps and data
    tt.[DatapointId],
    tt.[Value]
FROM 
    ( SELECT DISTINCT [StationID]
      FROM ArchiveTable
    ) AS ds
  OUTER APPLY
    ( SELECT TOP (1) s.[Timestamp] AS ExtendedStartDate
      FROM ArchiveTable AS s
      WHERE s.[StationId] = ds.[StationId]
        AND s.[Timestamp] < @StartDate
      ORDER BY s.[Timestamp] DESC
    ) AS st
  OUTER APPLY
    ( SELECT TOP (1) e.[Timestamp] AS ExtendedEndDate
      FROM ArchiveTable AS e
      WHERE e.[StationId] = ds.[StationId]
        AND e.[Timestamp] > @EndDate
      ORDER BY e.[Timestamp] ASC
    ) AS en
  OUTER APPLY
    ( SELECT t.*
      FROM ArchiveTable AS t
      WHERE t.[StationId] = ds.[StationId]
        AND t.[Timestamp] >= COALESCE(st.ExtendedStartDate, @StartDate)
        AND t.[Timestamp] <= COALESCE(en.ExtendedEndDate, @EndDate)  
    ) AS tt ;