This feels like such a common question, I'll understand if it is closed but if so please suggest a better place I could ask. I have the following two tables of interest:
CREATE TABLE [dbo].[Sessions]
(
[Id] [int] PRIMARY KEY,
[DateConnected] [datetime] NOT NULL,
[Origin] [nvarchar](max) NULL,
[TrackerId] [int] NULL,
[Imei] [nvarchar](max) NULL,
[Sim] [nvarchar](max) NULL,
[ProtocolVersion] [tinyint] NULL
)
CREATE TABLE [dbo].[PacketTransmissions]
(
[Id] [int] PRIMARY KEY,
[RequestId] [int] NULL,
[SessionId] [int] NOT NULL,
[DateProcessed] [datetime] NOT NULL,
[Direction] [int] NOT NULL,
[Sequence] [int] NOT NULL,
[Acknowledgement] [int] NOT NULL,
[DateRecorded] [datetime] NOT NULL,
[Version] [tinyint] NOT NULL,
[Command] [tinyint] NOT NULL,
[Flags] [tinyint] NOT NULL,
[Checksum] [tinyint] NOT NULL,
[Data] [varbinary](max) NULL
)
CREATE NONCLUSTERED INDEX [IX_TrackerId_DateConnected] ON [dbo].[Sessions]
(
[TrackerId] ASC,
[DateConnected] ASC
)
CREATE NONCLUSTERED INDEX [IX_SessionId_DateProcessed] ON [dbo].[PacketTransmissions]
(
[SessionId] ASC,
[DateProcessed] ASC
)
INCLUDE ([Direction], [Sequence], [Acknowledgement], [Command])
My most common query, and most expensive (quite often times out now) involves listing all packet transmissions for a particular tracker.
DECLARE @TrackerId INT = 10
DECLARE @StartDate DATETIME2 = '2018-03-10'
DECLARE @EndDate DATETIME2 = '2018-03-12'
SELECT [PacketTransmissions].*
FROM [Sessions]
JOIN [PacketTransmissions] ON [PacketTransmissions].[SessionId] = [Sessions].[Id]
WHERE [Sessions].[TrackerId] = @TrackerId
AND [PacketTransmissions].[DateProcessed] > @StartDate
AND [PacketTransmissions].[DateProcessed] < @EndDate
ORDER BY [PacketTransmissions].[DateProcessed] DESC
This was good at first, but now there is a lot of data, it has slowed right down. My attempt to get the query plan today took 2 minutes, and shows that it will be using a table scan, rather than the index I created. Even when I force the index, it is still very slow.
In comparison, if I choose a session first, and search only for packet transmissions recorded within that session, the query uses the index and is incredibly fast.
My most successful attempt to speed up the query has been to order the results first by session id, then by date processed, to match the index order. While this is not technically always correct, it is acceptable. However, even this has started to time out, and I feel like there is something wrong with my understanding of how to make the JOIN
faster.
What can I do to improve the performance of this query?
Querying with DATETIME
variables instead of DATETIME2
has simplified the query plan, however it is still very slow.
-
Sessions has 265,929 rows
-
PacketTransmissions has 32,916,233 rows
That works out to be 123.7 packets per session, on average.
-
Some of the sessions are for unregistered devices, so they create a session, send between one and three packets, and then the session is rejected by the server.
-
I will normally be debugging a registered device, so the actual number of packets per session is considerably higher, between 300 and 5000 packets per session
-
Some trackers may maintain the same session for a month at a time if they have connectivity
I have in the past had a bad experience with changing the clustered index to use a non-sequential key. It results in a lot of out-of-order writes, and page splits, and the insert performance drops significantly.
The problem with the actual execution plans is that I don't want to run the database at max DTU for up to an hour, and potentially have inserts fail in the meantime.
Best Answer
Perhaps this is crazy, but I like to try a bit of blue-sky-thinking every once in a while, so I'd consider adding the
TrackerId
column to thedbo.PacketTransmissions
table to avoid the join completely. Obviously, this means you need to modify the row-insert procedure for the table, which may or may not be feasible.However, this change, combined with a simple index:
creates a query plan using a run-of-the-mill index seek, combined with a key lookup for each row returned. As in:
To test this, I created a minimally complete verifiable example:
On my system, this creates around 700,000 session rows, and double that number of transmission rows.
The query then becomes: