Sql-server – Seemingly equivalent queries take vastly different execution times

performancequery-performancesql servert-sql

I am encountering an issue where two different ways of executing the same query are taking very different execution times. My guess is that in one way or another, the slow query is unable to utilize my index.
I have a large table of data points (Data)

TABLE(assetId INT, dataItemId INT,timestamp DATETIME(2),value REAL)

There is a full table unique clustered index here. assetId is the owner of the datapoint, dataItemId is other metadata for the datapoint, including a label. So there is a table DataItems

TABLE(dataItemId INT, labelId INT, unitId INT)

And finally a table of Labels (there is another table just like Labels for Units). I have FK constraints on the DataItems/Labels/Units tables.

TABLE(labelId INT, label VARCHAR(128))

I would like to make a query for data across a time range asking for all datapoints with a certain label for a particular asset. Something like:

--QUERY #1
SELECT Data.timestamp,Labels.label,Data.value 
FROM Data 
JOIN DataItems on DataItems.dataItemId = Data.dataItemId 
JOIN Labels on DataItems.labelId = Labels.labelId
WHERE Data.assetId=134
AND Label.label IN ('Eng_RPM','Eng_Load','Eng_OilTemp')
AND Data.timestamp between '2016-09-27T12:00:00Z' AND '2016-09-27T14:00:00Z'

However, this query (returns 21000 rows) takes about 38 seconds to execute. So I tried an alternate query, where I use dataItemId instead of Labels:

--QUERY #2
SELECT Data.timestamp,Labels.label,Data.value 
FROM Data 
JOIN DataItems on DataItems.dataItemId = Data.dataItemId 
JOIN Labels on DataItems.labelId = Labels.labelId
WHERE Data.assetId=134
AND Data.dataItemId IN (
   SELECT dataItemId from DataItems 
   JOIN Labels ON DataItems.labelId=Labels.labelId
   WHERE label in ('ENG1_Eng_Load','ENG2_Eng_Load','ENG3_Eng_Load')
   )
AND Data.timestamp between '2016-09-27T12:00:00Z' AND '2016-09-27T14:00:00Z'

I didn't expect this to work any better, and it didn't. However, I then took the output of the nested select in the WHERE clause, and used it as a constant:

--QUERY #3
SELECT Data.timestamp,Labels.label,Data.value 
FROM Data 
JOIN DataItems on DataItems.dataItemId = Data.dataItemId 
JOIN Labels on DataItems.labelId = Labels.labelId
WHERE Data.assetId=134
AND Data.dataItemId in (618,654,690)
AND Data.timestamp between '2016-09-27T12:00:00Z' AND '2016-09-27T14:00:00Z'

This query runs like a dream with <1s query times. I don't understand why though – it seems to me that the nested query for finding the dataItemId from a label should execute first, and I verified that it is a trivial query. From there, why is there a difference between the way #2 and #3 execute, and why does it take so much longer? I am new to databases in general, and to SQL server/ Management studio, so my apologies if I have overlooked some wonderful feature for analyzing this problem.

Edit:

Indices/PKs are as follows:

 --Data: 
CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex-20160725-145924] ON      [dbo].[Data]
 (
[assetId] ASC,
[dataItemId] ASC,
[timestamp] ASC,
[value] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = ON, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]


--DataItems
ALTER TABLE [dbo].[DataItems] ADD PRIMARY KEY CLUSTERED 
(
[dataItemId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

--Labels
ALTER TABLE [dbo].[Labels] ADD PRIMARY KEY CLUSTERED 
(
[labelId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

I have viewed the execution paths and I did find that the slow queries are not using the Clustered index on Data – they are using a different Non-clustered index that I did not intent to keep (on assetId,timestamp)

Best Answer

SELECT Data.timestamp, Labels.label, Data.value 
FROM Data 
JOIN DataItems 
      on DataItems.dataItemId = Data.dataItemId 
     and Data.assetId = 134
     AND Data.timestamp between '2016-09-27T12:00:00Z' AND '2016-09-27T14:00:00Z'
JOIN Labels 
      on DataItems.labelId = Labels.labelId
     AND Label.label IN ('Eng_RPM','Eng_Load','Eng_OilTemp')

maybe add a non-clustered index on Label.label