Sql-server – SQL Server Query with literals runs instantly, using variables takes minutes

optimizationquery-performancesql server

For Microsoft SQL Server 2017 (RTM-CU20) on Azure running an ERP system (so changing the code here is not an option).

I have a table with over 42 million rows. It's a machine operating stats tracking table. Users enter operating statistics by equipment, date, shift and type of stat (Operating hours, Tons, whatever).

The table has a primary key structure like this (please no comments about nchar instead of nvarchar or storing dates as strings, that's how the ERP vendor provides it and that's just what I have to work with):

equip nchar(12)       This is a 12 character equipment number
key_type nchar(1)     One of two values E for equipment or G - always E here
shift_seq_no nchar(2) Sequence of shifts (usually 00 or 01)
stat_date nchar(8)    Date, YYYYMMDD
stat_type nchar(2)    A code for stat type, OH here (operating hours)
trc_seq_no nchar(3)   A sequence number for when there's more than one entry per shift

It appears that the calling code (java based web app making a JDBC call to the database with JBOSS in the middle of it all) queries a list of equipment scheduled for preventative maintenance in a given date range and goes looking for the most recent stat for each piece of equipment, since the row will contain the cumulative stat value for the equipment.

It then puts together a very long SQL query that effectively does this (replace x with appropriate variables:

select * from table
where (equip = x and key_type = x and shift_seq_no = x and stat_date = x and stat_type = x and trc_seq_no = x)
or (equip = x and key_type = x and shift_seq_no = x and stat_date = x and stat_type = x and trc_seq_no = x)
or (equip = x and key_type = x and shift_seq_no = x and stat_date = x and stat_type = x and trc_seq_no = x)
or ( and on and on)

There are 73 repeats of the 'OR' in total

I generated two queries based upon what I can see is running.

One where I just have a straight up set of 73 sets of literals (equip = '000001234567' and col = 'value' and etc) – this runs instantly. The actual execution plan shows a Clustered Index Seek (Clustered) in a Nested Loop with a Constant Scan showing 73 executions of 1 row read and a run time of less than 1 second

I then took the same query and declared 428 variables and plugged them into the query in place of the 428 literals instead. The actual execution plan shows a Clustered Index Scan (Clustered) in a single step (Execution plan says SELECT -> Filter -> Clustered Index Scan). The hint shows all 42,271,100 rows read with a run time of 5 minutes and 34 seconds when run in SSMS. This is a web based application and the debug tracing through the online app comes in at over 8 minutes. You can imagine that just isn't deemed acceptable for an online screen.

Now that the long story is complete, does anyone have any recommendations of SQL Server tuning that might be able to be done in order to make this query run via variables the same way it runs with literals since I can't touch the code.

I think I'll be sending this back to the vendor for analysis as well, but I was hoping that someone here in the brain trust might have an idea for a quicker solution (since my Google Fu has failed me at the moment).

I've seen several articles about this happening when querying a view and the general solution seems to be "add a WITH (RECOMPILE) hint" but I'm not using a view here, I'm querying a table directly.

Thanks in advance for any help here. I'm primarily an Oracle guy and know enough about SQL Server to be dangerous.

Best Answer

The root cause of this is simple. The query optimizer makes row estimates to determine how it's going to resolve the query. The row estimates are based on the statistics for the columns involved. When you give the optimizer literals, it uses exact values to look at the statistics for row estimates. In this case, those estimates result in a plan that gives you the seek, meaning it's found that for the specific value there are only a few matching rows and it uses the index to find those rows.

When you give the optimizer a variable, it can't know what the value within the variable is. So, instead of looking at the statistics for a specific value and making the row estimates based on that, it uses an average based on the overall selectivity of the statistics. In this case, an average suggests to the optimizer that lots of rows are likely to match so a scan is the more efficient mechanism for retrieval.

First thing I'd try is an update to the statistics. It's possible that they're out of date and the row estimates with a variable are just inaccurate. For the best possible statistics, use the FULL SCAN option. That will take longer and use more resources.

Next, possibly changing the query to a stored procedure and using parameters instead of local variables could result in better plans. This is because parameter values are known at compile time through a process called parameter sniffing (sorry, I didn't name it). You may see more consistent behavior because of parameter sniffing (although, sometimes you can see inconsistent behavior because of parameter sniffing).

Next, yes recompile could help. It may be that your data is just too wildly skewed for the row estimates to ever be consistently right. Recompiling every time you execute will mean that a plan unique for the values passed will be generated. Further, recompiling means that the variable values are now known and the sniffing process is applied.

It's also possible that the query just needs some restructuring. Lots of ORs like that with long lists of filter criteria sounds like a catch-all query. Recompile can help there, but you also might be better off restructuring to either nested procedures (73 of them is a lot though) or using dynamic T-SQL to build individual queries instead of long, lists of OR statements.

The last thing you have is that you're on SQL Server 2017. You can enable Query Store and use plan forcing to make it use the plan that performs better.