Sql-server – Why is an additional filtered statistic being ignored (EAV schema)

cardinality-estimateseavsql serverstatistics

I'm trying to improve a row estimate for this sub-query (of a larger query). The estimate is showing 1266 rows. The actual is 117k rows. This particular property (EAV schema) only has two values defined for it (2 and 3):

declare @pPropVal smallint = 2;

select Value, ObjectId 
  from Oav.ValueArray PropName
 where PropName.PropertyId = 897
   and PropName.Value  = @pPropVal
option (recompile)

The query plan shows the proper seek predicate on index IX_ValueArray_PropValObj on PropertyId and Value as expected.

(A) As an attempt to improve row estimates, an additional statistic was added which brought the row estimate up slightly to 3041:

create statistics [ST_SomePropertyName] ON [Oav].[ValueArray](PropertyId, Value, ObjectId)
 where 
     (     
             PropertyId = 897 
         and [Value] is not null
     )
  with fullscan

The histogram shows a single row. The HI key is just the PropertyId (the first column) which is not that useful so as I understand it, it is using the density information.

RANGE_HI_KEY    RANGE_ROWS  EQ_ROWS  DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
897             0           196026   0                    1

All density Average Length  Columns
1           4               PropertyId
0.5         8               PropertyId, Value

Name    Updated Rows    Rows Sampled    Steps   Density Average key length  String Index    Filter Expression   Unfiltered Rows
ST_SomePropertyName May 20 2014  2:01PM 196026  196026  1   0   8   NO  ([PropertyId]=(897) AND [Value] IS NOT NULL)    9317055

(B) Since there is a filter on PropertyId = 897, I thought I could re-create the statistic like this:

create statistics [ST_SomePropertyName] ON [Oav].[ValueArray](Value, ObjectId)
where
    (       
       PropertyId = 897 
       and [Value] is not null
    )
 with fullscan

The histogram looks useful to my eyes but the estimator appears to be ignoring it because it reverts to the original estimate of 1266.

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
2             0           117760   0                     1
3             0           78266    0                     1

All density   Average Length  Columns
0.5           4               Value
5.101364E-06  12              Value, ObjectId

Name    Updated Rows    Rows Sampled    Steps   Density Average key length  String Index    Filter Expression   Unfiltered Rows
ST_SomePropertyName May 20 2014  2:04PM 196026  196026  2   0   12  NO  ([PropertyId]=(897) AND [Value] IS NOT NULL)    9317055

(C) It does work to filter to a fixed value (and not even need the 2nd two columns) but that is not a very practical solution. This gave the exact estimate 117k.

create statistics [ST_SomePropertyName] ON [Oav].[ValueArray](PropertyId)
 where 
     (     
             PropertyId = 897 
         and [Value] = 2
     )
  with fullscan

histogram:

RANGE_HI_KEY   RANGE_ROWS  EQ_ROWS  DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
897            0           117760   0                    1

(D) (Added to original question) An approach of limiting the value to a smaller range helps. But if the range values are not uniform or the value was a string based field or not even known this may not be a good workaround in general:

CREATE STATISTICS [ST_ListUnderBrand_897] ON [Oav].[ValueArray](PropertyId, Value)
WHERE 
  (       
      PropertyId = 897 
      and [Value] >= 1 and [Value] <= 20
  )
  with fullscan

This gives estimates of about 16k. Changing the [1,20] to the exact [2,3] gives estimates of ~80k. It seems obvious that the true range of Values from the table data are not really used (since it is 2nd column) and this is some estimate based mostly on the filter range.

Please note the Value field is a sql_variant but I don't think that is related as the query plan does not show any implicit conversions.

Why doesn't SQL Server use the statistics from B? Should it?

Are there other options available to fix this?

Best Answer

Filtered indexes and statistics won't come into play when you're using local variables, unless you use the OPTION (RECOMPILE) query hint, and are running SQL Server 2008 R2 or later.

Tim Chapman's MSDN blog post explains with examples.