Sql-server – Clustered Index ‘Seek predicate’ and ‘predicate’ on the same column

indexsql server

I have a clustered index primary key column and I'm doing a range query on it.
The problem is the scan only uses the first range part for the seek predicate, leaving the other side of the range as a residual predicate.
which causes to read all the rows up to the @upper limit

I'm using two parameters for the range:

declare
    @lower numeric(18,0) = 1000,
    @upper numeric(18,0) = 1005;

select * from messages
where msg_id between @lower+1 and @upper;

In that case the actual execution plan shows:

predicate: messages.msg_id >= lower
seek predicate: messages.msg_id < @upper
rows read: 1005

Table definition (Simplified):

CREATE TABLE [dbo].[messages](
    [msg_id] [numeric](18, 0) IDENTITY(0,1) NOT NULL,
    [col2] [varchar](32) NOT NULL,
 CONSTRAINT [PK_Message] PRIMARY KEY CLUSTERED 
(
    [msg_id] ASC
)

More information

When used with constants instead of variables both of the predicates are 'Seek'
Have tried Option (Optimize for (@lower=1000)), without success

Best Answer

Starting with your original query:

declare
    @lower numeric(18,0) = 1000,
    @upper numeric(18,0) = 1005;

select * from [messages]
where msg_id between @lower+1 and @upper;

The 1 that you added has a data type of integer by default. When adding an integer value to a numeric(18,0) value SQL Server applies the rules of data type precedence. int has a lower precedence so it gets converted to a numeric(1,0). Your query is equivalent to the following:

declare
    @lower numeric(18,0) = 1000,
    @upper numeric(18,0) = 1005;

select * from [messages]
where msg_id between @lower+CAST(1 AS NUMERIC(1, 0)) and @upper;

A different set of rules around Precision, scale, and Length is applied to determine the data type of the expression involving @lower. It isn't safe to just use NUMERIC(18,0) because that could be overflowed (consider 999,999,999,999,999,999 and 1 as an example). The rule that applies here is:

╔═══════════╦═════════════════════════════════════╦════════════════╗
║ Operation ║          Result precision           ║ Result scale * ║
╠═══════════╬═════════════════════════════════════╬════════════════╣
║ e1 + e2   ║ max(s1, s2) + max(p1-s1, p2-s2) + 1 ║ max(s1, s2)    ║
╚═══════════╩═════════════════════════════════════╩════════════════╝

For your expression, the resulting precision is:

max(0, 0) + max(18 - 0, 1 - 0) + 1 = 0 + 18 + 1 = 19

and the resulting scale is 0. You can verify this by running the following code in SQL Server:

declare
@lower numeric(18,0) = 1000,
@upper numeric(18,0) = 1005;

SELECT 
  SQL_VARIANT_PROPERTY(@lower+1, 'BaseType') lower_exp_BaseType
, SQL_VARIANT_PROPERTY(@lower+1, 'Precision') lower_exp_Precision
, SQL_VARIANT_PROPERTY(@lower+1, 'Scale') lower_exp_Scale;

This means that your original query is equivalent to the following:

declare
    @lower numeric(19,0) = 1000 + 1,
    @upper numeric(18,0) = 1005;

select * from [messages]
where msg_id between @lower and @upper;

SQL Server can only use @lower to do a clustered index seek if the value can be implicitly converted to NUMERIC(18, 0). It is not safe to convert a NUMERIC(19,0) value to NUMERIC(18,0). As a result the value is applied as a predicate instead of as a seek predicate. One workaround is to do the following:

declare
    @lower numeric(18,0) = 1000,
    @upper numeric(18,0) = 1005;

select * from [messages]
where msg_id between TRY_CAST(@lower+1 AS NUMERIC(18,0)) and @upper;

That query can process both filters as seek predicates:

My advice is to change the data type in the table to BIGINT if possible. BIGINT requires one fewer byte than NUMERIC(18,0) and benefits from performance optimizations not available to NUMERIC(18,0) including better support for bitmap filters.

Related Solutions

Sql-server – Clustered Index Help, where did i go wrong

Are there exceptions to the above clustered index rules?

The general guidance for selecting a clustered index is good, but there are sometimes additional considerations to be taken into account. Occasionally, these extra factors can be more important than the general 'rules'.

Your scenario is somewhat "special" in that you have a very wide table, with a set of queries that request a (presumably) unpredictable set of columns, although the query predicates are normally the same.

The original clustered index arrangement was likely expensive for operations that change data, since changing any part of the clustering key means changing all nonclustered indexes as well. In addition, physically moving whole rows is costly in terms of log generation, especially where a page split occurs.

That said, once a clustered index has been modified and split a fair bit, it will contain quite a bit of free space, making future movements less intensive, just as if a sensible fill factor had been set in the first place, and maintained as part of normal index maintenance.

It seems likely your production system had fallen into a sort of equilibrium, where page splitting occurred at a steady, reasonable rate. Nonclustered index maintenance would still be relatively expensive, but it seems that was not a dominant factor.

The crucial advantages of the original indexing arrangement were:

The clustered index (Company, Region, FlagA, FlagB, FlagC) matches the predicate of:
```
SELECT {unpredictable column list}
FROM table 
WHERE Company = 'company' 
AND Region = 'region' 
AND flagA= 'A'
AND flagB = 'B'
AND flagC = 'C';
```
...while at the same time providing access to whichever columns are listed in the select list.
Queries of the form:
```
SELECT * 
FROM table 
WHERE ID = 1234;
```
...were adequately supported by the nonclustered primary key. This query always returns a single row, so only a single lookup via the clustered index is required after the row is located in the nonclustered index.

Changing to a clustered index on ID, and a nonclustered index on (Company, Region, FlagA, FlagB, FlagC) makes the second query a little more efficient (eliminating one lookup per query), but makes the first query much less efficient (replacing zero lookups with ~5000).

Moreover, it is very possible the optimizer might choose not to use the nonclustered index at all, estimating that a full scan of the table would be cheaper than the 5000 lookups.

You're probably better off leaving the two main indexes as they are for the time being, while you sort out the 40-odd nonclustered indexes, and analyse the workload for the minimal set of covering nonclustered indexes that would be required. Once that data is available, you will be in a better position to consider fundamental indexing changes.

You may also want to check the existing monitoring and maintenance routines for this table. In many scenarios, the 'rule-breaking' clustered index will be just fine, if the table is given a fill factor that just prevents significant page splitting before the next maintenance window comes around.

Sql-server – deteriorating stored procedure running times

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.

Best Answer

Related Solutions

Sql-server – Clustered Index Help, where did i go wrong

Sql-server – deteriorating stored procedure running times

Related Question