Sql-server – Why would a query work slower when a where clause is added

performancequery-performancesql-server-2008-r2

I've got two databases and both have got the same view over the same table which has the same indexes.

The view selects the top location for a given IMEI from a locations table.

CREATE VIEW [dbo].[LatestDeviceLocation]
AS 
SELECT DISTINCT t.Imei, t.Accuracy, t.UserId, t.Lat, t.Lng, t.Timestamp
FROM (SELECT Imei, MAX(Timestamp) AS latest
    FROM      dbo.DeviceLocation
    GROUP BY Imei) AS m INNER JOIN
    dbo.DeviceLocation AS t ON t.Imei = m.Imei AND t.Timestamp = m.latest
GO

I'm querying the view with a very simple select with what seems like a very simple where clause.

SELECT TOP 1000 [Imei]
      ,[Accuracy]
      ,[UserId]
      ,[Lat]
      ,[Lng]
      ,[Timestamp]
  FROM [dbo].[LatestDeviceLocation]
  Where [Timestamp] > '2015-02-19T00:00:00.000Z' AND [Timestamp] < '2015-02-26T23:59:59.999Z'

On my live server when I query my view I get data back in < 1 second. When I add a where clause Where [Timestamp] > '2015-02-19T00:00:00.000Z' AND [Timestamp] < '2015-02-26T23:59:59.999Z' that jumps up to approximately 1 minute.

On my test server which has 10x more data (350k+ locations shared by approximately same number if Imei numbers as the live site, 25) the query returns data in < 1 second with or without the where clause.

I've looked for locks and can't see any.

I've re-created the index incase it was corrupted and no difference.

I've completely removed the index, performance didn't change.

This is the index that I've used on both servers.

/****** Object:  Index [GangHeatMapIndex]    Script Date: 02/26/2015 22:38:38 ******/
CREATE NONCLUSTERED INDEX [GangHeatMapIndex] ON [dbo].[DeviceLocation] 
(
    [UserId] ASC,
    [Timestamp] ASC,
    [Imei] ASC
)
INCLUDE ( [DeviceLocationId],
[Accuracy],
[Lat],
[Lng]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Edit: I've just realised that I wasn't looking in the right place for locks. It is taking out object locks when querying. I'm trying to work out how to write my view with "no lock" built into the view.

Edit 2: I've attached the execution plans, top on is with the index, bottom is without.

Execution plans

Edit 3: More executions plans, this time all on the live server, with the index re-added, with and without where clauses.

Exectution Plan - with index, without where.

Exectution Plan - with index, with where.

Edit 4:

I've changed the view to use a common table expression as follows and the performance is much better.

WITH cte 
       AS (SELECT Rank() 
                    OVER ( 
                      partition BY dloc.[Imei]
                      ORDER BY dloc.[Timestamp], devicelocationid DESC) AS arank,
                 dloc.*
           FROM   [dbo].[DeviceLocation] AS dloc) 
  SELECT [Imei], [Accuracy], [UserId], [Lat], [Lng], [Timestamp]
  FROM   cte 
  WHERE  arank = 1

Including the device DeviceLocationId in the order by prevented any duplicates occurring in the final result.

Best Answer

Edit - The below is assuming that the number of rows that would be returned by the query (if the limit wasn't present) exceeds the limit. (as in ... it regularly would return 5000 rows, but the limit forces it to return 1000)

Any time you have a limit on the number of rows returned by a query, you should not expect the TIMING on that query to have any sort of relevance to performance.

For example, if you take a simple query as such:

SELECT * FROM table_with_1m_rows;

It will take a while to process because it has to fetch all the rows via a sequential scan..

If I adjust it to:

SELECT TOP 1000 * FROM table_with_1m_rows;

It will return relatively quickly because, while it still does a sequential scan, it can STOP after it gets past 1000 rows.

If I then adjust it to:

SELECT TOP 1000 * FROM table_with_1m_rows WHERE col1 > 100;

It will take LONGER than the previous query because, while it still does a sequential scan, it will most likely have to scan more than 1000 rows before it has 1000 rows to return with.

All of the above holds true whether the DB needs to use a sequential scan or an index scan..

If you truly want to troubleshoot performance of the query, you need to remove the TOP 1000 and then view your query plan and see where the performance hit is... (in this case, most likely an index that would be useful is missing)

Related Solutions

Sql-server – SQL Server WHERE clause on CLR method (spatial) peformance

The first thing to look at is your indexing strategy. Bad execution plans are often caused by insufficient indexes or stale statistics. Your statistics warning might hint at that.

If that does not resolve your problem there are a few hacks that you can try:

A top operation requires SQL Server to separate query sections in the execution plan.

If you know you are always dealing with less then 2 billion rows you could write your query like this:

SELECT * FROM(
  SELECT TOP(2000000000) *
    FROM <complex join>
    WHERE ro.reportguid = '64c0a4af-ee4d-4e83-a194-2a14e8a6ab0e'
  )X
WHERE l.geom.STGeometryType() = 'Point'

An alternative is to write a scalar valued function that takes in the l.geom column and a few other columns from the other tables and returns the STGeometryType() value while ignoring all the other values. Because SQL Server does not consider the function logic at optimization time, it is forced to execute the function after the join. That does not guarantee that the other filter is executed first but often it works out that way.

The third option is to play around with join hints and join order. They often lead to a change in where filters are applied.

All three options I would consider a last resort because they make the code ugly and you run the risk that someone removes that ugliness later trying to make the code better.

Sql-server – Why is the ORDER BY clause in a view ignored as soon as it’s called with a WHERE claue

The query you posted is not valid for creating a view; running CREATE VIEW xy AS for this query will result in an error. Are you using a TOP clause?

A view, being a table expression (a set), can't have the order defined, since that would be against the principles of a relational model (there is no order for rows in a relational table - a set is an unordered collection of tuples). Same goes for other table expressions - derived tables, CTEs etc.

From BOL article about the ORDER BY clause:

The ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries, unless either the TOP or OFFSET and FETCH clauses are also specified. When ORDER BY is used in these objects, the clause is used only to determine the rows returned by the TOP clause or OFFSET and FETCH clauses. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself.

Long story short: Use the ORDER BY clause in the outer query that references the view. Do not use it in a view. Even using it with TOP(100) PERCENT (or on SQL Server 2012, the OFFSET-FETCH equivalent) does not guarantee presentation order, it just means you'll get the top 100% of the rows, in any order.

Best Answer

Related Solutions

Sql-server – SQL Server WHERE clause on CLR method (spatial) peformance

Sql-server – Why is the ORDER BY clause in a view ignored as soon as it’s called with a WHERE claue

Related Question