Sql-server – What if all the non clustered indexes on the table were filtered indexes

filtered-indexindexperformancesql serversql-server-2005

I have a table that is used heavily. This table has 370,370 records.

When I run the following select:

select  [ImageSize] , the_num = count(*)
from  [dbo].[ProductImages] 
group by [ImageSize] 
order by [ImageSize] 

I get this result:

enter image description here

Basically there are only 5 different imageSize objects in that table.

This is the current table definition:

IF OBJECT_ID('[dbo].[ProductImages]') IS NOT NULL 
DROP TABLE [dbo].[ProductImages] 
GO

CREATE TABLE [dbo].[ProductImages] ( 
[ProductImageID]  INT              IDENTITY(1,1)   NOT NULL,
[SeasonItemID]    VARCHAR(5)                       NOT NULL,
[Language]        SMALLINT                         NOT NULL,
[Tier1]           VARCHAR(10)                      NOT NULL,
[Tier2]           VARCHAR(10)                      NOT NULL,
[Gender]          CHAR(1)                          NOT NULL,
[SortOrder]       SMALLINT                         NOT NULL,
[ImageFormat]     VARCHAR(20)                      NOT NULL,
[ImageSize]       VARCHAR(10)                      NOT NULL,
[ImageWidth]      SMALLINT                         NOT NULL,
[ImageHeight]     SMALLINT                         NOT NULL,
[BodenDomainURL]  NVARCHAR(40)                     NOT NULL,
[BodenImageURL]   NVARCHAR(210)                    NOT NULL,
CONSTRAINT   [PK_ProductImages]  
PRIMARY KEY NONCLUSTERED ([SeasonItemID] asc, 
                          [Language] asc, [Tier1] asc, [Tier2] asc, [Gender] asc, 
                          [SortOrder] asc, [ImageFormat] asc, 
                          [ImageSize] asc))

GO

And here are the definitions of the current inexes

  • 1 clustered index

  • 4 non clustered indexes

  • 1 filtered index

  • and the primary key non clustered defined above

      CREATE CLUSTERED INDEX 
      [CIX_ProductImages__ProductImageID] 
                 ON [dbo].[ProductImages] ([ProductImageID] asc)
    
      CREATE NONCLUSTERED INDEX
      [IDX_SeasonItemID_Tier1_Tier2_ImageSizeBodenImageURL_INC_BodenDomainURL] 
                 ON [dbo].[ProductImages] 
          ([SeasonItemID] asc, [Tier1] asc, [Tier2] asc, 
          [ImageSize] asc, [ImageFormat] asc, 
          [BodenImageURL] asc)
                 INCLUDE ([BodenDomainURL])
    
      CREATE NONCLUSTERED INDEX
      [IX_ProductImages_ImageFormat_ImageSize_INCL] 
          ON [dbo].[ProductImages] 
          ([ImageFormat] asc, [ImageSize] asc)
                 INCLUDE ([SeasonItemID], [Language],
           [Tier1], [Tier2], [BodenDomainURL], 
          [BodenImageURL])
    
    
      CREATE NONCLUSTERED INDEX 
      [IX_ProductImages_Language_Tier1_Tier2_ImageFormat_INCL] 
          ON [dbo].[ProductImages] 
          ([Language] asc, [ImageSize] asc, [Tier1] asc, [Tier2] asc, 
          [ImageFormat] asc)
                 INCLUDE ([ProductImageID], [BodenImageURL])
    

This is a filtered index:

    CREATE NONCLUSTERED INDEX 
    [IDX_ProdImages_GetProductListingPageDenormalisedData] 
        ON [dbo].[ProductImages] 
        ([SeasonItemID] asc, [Tier1] asc, [Tier2] asc, 
        [ImageFormat] asc, [ImageWidth] asc, [ImageHeight] asc)
               INCLUDE ([BodenImageURL])
               WHERE ([ImageWidth]=(195) 
        AND [ImageHeight]=(252) 
        AND [ImageFormat]='ProductMain')
             

Can I expect significant gains by changing the 4 non clustered indexes above to 20 FILTERED INDEXES according to the imageSize?

Most queries use the ImageSize as one of the parameters. It is even in the clustered index.

I could modify the indexes and table structure.

This table is mostly used for reading. There is only 1 write a day, and that would not normally affect many records.

Best Answer

Significant gains? No, I shouldn't think so.

SQL Server's indexes are B-Trees. These have a hierarchical nature. To read a row the server starts at the root node of the index then steps through one or more intermediate nodes before reading the row's values via the leaf node. (Details vary between clustered & non-clustered indexes, and heaps.) The speed-limiting factor is often the number of levels there are in the index and, hence, the number of pages (each index node is its own page) that have to be read. This, in turn is controlled by the fan-out, the number of lower-level pages a higher-level page can reference, which is itself determined by the total size of the key. Let's do some hand-wavy maths.

Index IDX_SeasonItemID_Tier1_Tier2_ImageSizeBodenImageURL_INC_BodenDomainURL is defined as

Column          Definition     Effective bytes  Note
------------    -------------  ---------------  -----------------------------------
SeasonItemID    VARCHAR(5)                   5  Use the full size
Tier1           VARCHAR(10)                  5  Assume 50% of defined size
Tier2           VARCHAR(10)                  5  Assume 50% of defined size
ImageSize       VARCHAR(10)                  7  Minimum from given data
ImageFormat     VARCHAR(20)                 10  Assume 50% of defined size
BodenImageURL   NVARCHAR(210)              210  Assume 50% of defined size, 2 bytes per character.
                                          ====
Total                                      242 bytes

I'll ignore the included column for fan-out since it's only present in the leaf nodes. Since there's about 8060 usable bytes per page that gives roughly 8060 / 242 = 33 rows per page. So a one-level index can reference up to 33 rows, a two-level index can reference 33 * 33 = 1,089 rows, a three-level 35,937 rows, and a four-level 1,185,921 rows. Given your counts of 74,000 rows per picture size and 370,000 rows in total you'll need a four-level index1 for both filtered and non-filtered indexes. There will be no reduction in the number of pages referenced and so no improvement in run-time.

These numbers are a lower limit as the included columns and the column of the clustered index will take further space in the leaf pages of the non-clustered index.

Adding these indexes will introduce risks, however. One is that the optimizer can be finicky about when it uses them. Another is that bringing on a new size (a data change) will require five more indexes to be defined (a DDL change). This is unwelcome maintenance overhead but without performing it the new-sized images will, in effect, be un-indexed. Schema changes will be complicated by the additional objects. Index maintenance will be slightly more complicated, if scripted manually. The overhead of the upper level of the indexes will increase the DB size fractionally, and with it backup and restore times.


1For the value used by an actual index see sys.dm_db_index_physical_stats.index_depth (docs).