Sql-server – SQL Server Query with literals runs instantly, using variables takes minutes

optimizationquery-performancesql server

For Microsoft SQL Server 2017 (RTM-CU20) on Azure running an ERP system (so changing the code here is not an option).

I have a table with over 42 million rows. It's a machine operating stats tracking table. Users enter operating statistics by equipment, date, shift and type of stat (Operating hours, Tons, whatever).

The table has a primary key structure like this (please no comments about nchar instead of nvarchar or storing dates as strings, that's how the ERP vendor provides it and that's just what I have to work with):

equip nchar(12)       This is a 12 character equipment number
key_type nchar(1)     One of two values E for equipment or G - always E here
shift_seq_no nchar(2) Sequence of shifts (usually 00 or 01)
stat_date nchar(8)    Date, YYYYMMDD
stat_type nchar(2)    A code for stat type, OH here (operating hours)
trc_seq_no nchar(3)   A sequence number for when there's more than one entry per shift

It appears that the calling code (java based web app making a JDBC call to the database with JBOSS in the middle of it all) queries a list of equipment scheduled for preventative maintenance in a given date range and goes looking for the most recent stat for each piece of equipment, since the row will contain the cumulative stat value for the equipment.

It then puts together a very long SQL query that effectively does this (replace x with appropriate variables:

select * from table
where (equip = x and key_type = x and shift_seq_no = x and stat_date = x and stat_type = x and trc_seq_no = x)
or (equip = x and key_type = x and shift_seq_no = x and stat_date = x and stat_type = x and trc_seq_no = x)
or (equip = x and key_type = x and shift_seq_no = x and stat_date = x and stat_type = x and trc_seq_no = x)
or ( and on and on)

There are 73 repeats of the 'OR' in total

I generated two queries based upon what I can see is running.

One where I just have a straight up set of 73 sets of literals (equip = '000001234567' and col = 'value' and etc) – this runs instantly. The actual execution plan shows a Clustered Index Seek (Clustered) in a Nested Loop with a Constant Scan showing 73 executions of 1 row read and a run time of less than 1 second

I then took the same query and declared 428 variables and plugged them into the query in place of the 428 literals instead. The actual execution plan shows a Clustered Index Scan (Clustered) in a single step (Execution plan says SELECT -> Filter -> Clustered Index Scan). The hint shows all 42,271,100 rows read with a run time of 5 minutes and 34 seconds when run in SSMS. This is a web based application and the debug tracing through the online app comes in at over 8 minutes. You can imagine that just isn't deemed acceptable for an online screen.

Now that the long story is complete, does anyone have any recommendations of SQL Server tuning that might be able to be done in order to make this query run via variables the same way it runs with literals since I can't touch the code.

I think I'll be sending this back to the vendor for analysis as well, but I was hoping that someone here in the brain trust might have an idea for a quicker solution (since my Google Fu has failed me at the moment).

I've seen several articles about this happening when querying a view and the general solution seems to be "add a WITH (RECOMPILE) hint" but I'm not using a view here, I'm querying a table directly.

Thanks in advance for any help here. I'm primarily an Oracle guy and know enough about SQL Server to be dangerous.

Best Answer

The root cause of this is simple. The query optimizer makes row estimates to determine how it's going to resolve the query. The row estimates are based on the statistics for the columns involved. When you give the optimizer literals, it uses exact values to look at the statistics for row estimates. In this case, those estimates result in a plan that gives you the seek, meaning it's found that for the specific value there are only a few matching rows and it uses the index to find those rows.

When you give the optimizer a variable, it can't know what the value within the variable is. So, instead of looking at the statistics for a specific value and making the row estimates based on that, it uses an average based on the overall selectivity of the statistics. In this case, an average suggests to the optimizer that lots of rows are likely to match so a scan is the more efficient mechanism for retrieval.

First thing I'd try is an update to the statistics. It's possible that they're out of date and the row estimates with a variable are just inaccurate. For the best possible statistics, use the FULL SCAN option. That will take longer and use more resources.

Next, possibly changing the query to a stored procedure and using parameters instead of local variables could result in better plans. This is because parameter values are known at compile time through a process called parameter sniffing (sorry, I didn't name it). You may see more consistent behavior because of parameter sniffing (although, sometimes you can see inconsistent behavior because of parameter sniffing).

Next, yes recompile could help. It may be that your data is just too wildly skewed for the row estimates to ever be consistently right. Recompiling every time you execute will mean that a plan unique for the values passed will be generated. Further, recompiling means that the variable values are now known and the sniffing process is applied.

It's also possible that the query just needs some restructuring. Lots of ORs like that with long lists of filter criteria sounds like a catch-all query. Recompile can help there, but you also might be better off restructuring to either nested procedures (73 of them is a lot though) or using dynamic T-SQL to build individual queries instead of long, lists of OR statements.

The last thing you have is that you're on SQL Server 2017. You can enable Query Store and use plan forcing to make it use the plan that performs better.

Related Solutions

Sql-server – Optimize simple query in SQL Server

It has no WHERE clause so it must process and aggregate all 40 million rows. SQL Server will not take advantage of the index order and skip scan ahead to the next IdTag once it has found the MAX for the current group but will continue processing the other rows in that group. Each group has an average of about 30,000 rows.

As you have another table that lists the 1,386 distinct IdTag types then you could try the following instead.

SELECT D.IdTag,
       V.PCTimeStamp,
       V.Now,
       datediff(SECOND, V.PCTimeStamp, V.Now) AS DELAY
FROM   DescriptionTagsOPC D
       CROSS APPLY (SELECT TOP 1 *,
                                 getdate() AS Now
                    FROM   ValuesTagsOPC V
                    WHERE  D.IdTag = V.IdTag
                    ORDER  BY PCTimeStamp DESC) V

To replace the scan of 40 million rows with 1,386 seeks.

If that table was not available then a recursive CTE could be used to achieve similar results.

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 IdTag, PCTimeStamp
        FROM ValuesTagsOPC
        ORDER BY IdTag DESC, PCTimeStamp DESC
        UNION   ALL
        SELECT  R.IdTag, R.PCTimeStamp
        FROM    (
                SELECT  V.*,
                        rn = ROW_NUMBER() OVER (ORDER BY V.IdTag DESC, V.PCTimeStamp DESC)
                FROM    ValuesTagsOPC V
                JOIN    RecursiveCTE R
                        ON  V.IdTag < R.IdTag
                ) R
        WHERE   R.rn = 1
        )
SELECT  IdTag,
        PCTimeStamp,
        getdate()                                 AS NOW,
        datediff(SECOND, PCTimeStamp, getdate()) AS DELAY
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

Sql-server – Clustered Table Scan Because of “SELECT *”

If you need columns in the output that aren't covered by the index, the optimizer has to make a choice:

Perform a table / clustered index scan (therefore all columns are there)
Perform a seek, then perform lookups to retrieve the columns not covered

Which way it will choose depends on a variety of things, including how narrow the index is, how many rows match the predicate, etc. You can force a seek with the FORCESEEK hint, but I suspect it will end up performing the same or worse than the scan SQL Server has chosen in your case.

Some options:

Change the app to run a proper query. I listed this first for a reason.
Create a view that selects only the columns you need:
```
CREATE VIEW dbo.myview
WITH SCHEMABINDING
AS
  SELECT col1, col2, col3 FROM dbo.tablename;
```
Then you can change the app to SELECT * from this view. Or you can get even more creative and rename the original table, and change the name of this view to what the name of the table used to be. Breaking change, obviously; proceed with caution.
Add all of the other columns to the key or INCLUDE list for the index. If these are hard-coded values and always the ones used, you may consider a filtered index.

Best Answer

Related Solutions

Sql-server – Optimize simple query in SQL Server

Sql-server – Clustered Table Scan Because of “SELECT *”

Related Question