Are there exceptions to the above clustered index rules?
The general guidance for selecting a clustered index is good, but there are sometimes additional considerations to be taken into account. Occasionally, these extra factors can be more important than the general 'rules'.
Your scenario is somewhat "special" in that you have a very wide table, with a set of queries that request a (presumably) unpredictable set of columns, although the query predicates are normally the same.
The original clustered index arrangement was likely expensive for operations that change data, since changing any part of the clustering key means changing all nonclustered indexes as well. In addition, physically moving whole rows is costly in terms of log generation, especially where a page split occurs.
That said, once a clustered index has been modified and split a fair bit, it will contain quite a bit of free space, making future movements less intensive, just as if a sensible fill factor had been set in the first place, and maintained as part of normal index maintenance.
It seems likely your production system had fallen into a sort of equilibrium, where page splitting occurred at a steady, reasonable rate. Nonclustered index maintenance would still be relatively expensive, but it seems that was not a dominant factor.
The crucial advantages of the original indexing arrangement were:
The clustered index (Company, Region, FlagA, FlagB, FlagC) matches the predicate of:
SELECT {unpredictable column list}
FROM table
WHERE Company = 'company'
AND Region = 'region'
AND flagA= 'A'
AND flagB = 'B'
AND flagC = 'C';
...while at the same time providing access to whichever columns are listed in the select list.
Queries of the form:
SELECT *
FROM table
WHERE ID = 1234;
...were adequately supported by the nonclustered primary key. This query always returns a single row, so only a single lookup via the clustered index is required after the row is located in the nonclustered index.
Changing to a clustered index on ID
, and a nonclustered index on (Company, Region, FlagA, FlagB, FlagC) makes the second query a little more efficient (eliminating one lookup per query), but makes the first query much less efficient (replacing zero lookups with ~5000).
Moreover, it is very possible the optimizer might choose not to use the nonclustered index at all, estimating that a full scan of the table would be cheaper than the 5000 lookups.
You're probably better off leaving the two main indexes as they are for the time being, while you sort out the 40-odd nonclustered indexes, and analyse the workload for the minimal set of covering nonclustered indexes that would be required. Once that data is available, you will be in a better position to consider fundamental indexing changes.
You may also want to check the existing monitoring and maintenance routines for this table. In many scenarios, the 'rule-breaking' clustered index will be just fine, if the table is given a fill factor that just prevents significant page splitting before the next maintenance window comes around.
What is up with FROM part JOIN model ON 1=1
? This the same as FROM part, model
, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?
You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.
This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.
You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql
, as in the following example:
CREATE PROCEDURE [dbo].[create_grid_materials2]
(
@partlistid bigint
, @pid bigint
, @masterid bigint
)
AS
BEGIN
begin
DECLARE @cmd NVARCHAR(MAX);
SET @cmd = '
INSERT INTO material (partid, personid, modelID)
SELECT
partid = part.id
, personid = @pid
, modelid = model.id
FROM part
INNER JOIN model ON 1=1
WHERE (
model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + '
AND model.modelSetID IS NULL
AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
AND (
part.partType = 100
or part.partType=120
or part.partType = 130
)
)
AND NOT EXISTS (
SELECT 1
FROM material AS a1
WHERE a1.partid = part.id
AND a1.personid=@pid
AND a1.modelid=model.id
)';
DECLARE @Params VARCHAR(200);
SET @Params = '@pid INT';
EXEC sys.sp_executesql @cmd
, @Params
, @pid = @pid;
end
End
The above code will cause a new plan to be generated for each combination of @partlistid
, and @masterid
.
The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.
Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid
since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.
Best Answer
Starting with your original query:
The
1
that you added has a data type ofinteger
by default. When adding aninteger
value to anumeric(18,0)
value SQL Server applies the rules of data type precedence.int
has a lower precedence so it gets converted to anumeric(1,0)
. Your query is equivalent to the following:A different set of rules around Precision, scale, and Length is applied to determine the data type of the expression involving
@lower
. It isn't safe to just useNUMERIC(18,0)
because that could be overflowed (consider 999,999,999,999,999,999 and 1 as an example). The rule that applies here is:For your expression, the resulting precision is:
and the resulting scale is 0. You can verify this by running the following code in SQL Server:
This means that your original query is equivalent to the following:
SQL Server can only use
@lower
to do a clustered index seek if the value can be implicitly converted toNUMERIC(18, 0)
. It is not safe to convert aNUMERIC(19,0)
value toNUMERIC(18,0)
. As a result the value is applied as a predicate instead of as a seek predicate. One workaround is to do the following:That query can process both filters as seek predicates:
My advice is to change the data type in the table to
BIGINT
if possible.BIGINT
requires one fewer byte thanNUMERIC(18,0)
and benefits from performance optimizations not available toNUMERIC(18,0)
including better support for bitmap filters.