Sql-server – should I include the primary key in the clustered index in order to make it unique

indexperformanceprimary-keyquery-performancesql serversql-server-2005

I have the following table:

IF OBJECT_ID('[dbo].[ShipTaxAddress]') IS NOT NULL 
DROP TABLE [dbo].[ShipTaxAddress] 
GO
CREATE TABLE [dbo].[ShipTaxAddress] ( 
[TaxRegionAddressId]  INT                              NOT NULL,
[TaxRegionId]         INT                              NOT NULL,
[CountryCode]         VARCHAR(2)                       NOT NULL,
[AddressFormatId]     INT                              NOT NULL,
[MatchAddressLine1]   NVARCHAR(50)                         NULL,
[MatchAddressLine2]   NVARCHAR(50)                         NULL,
[MatchAddressLine3]   NVARCHAR(50)                         NULL,
[MatchAddressLine4]   NVARCHAR(50)                         NULL,
[MatchAddressLine5]   NVARCHAR(50)                         NULL,
[MatchAddressLine6]   NVARCHAR(50)                         NULL,
[MatchPostalCode]     VARCHAR(20)                          NULL,
CONSTRAINT   [PK_ShipTaxAddress]  PRIMARY KEY CLUSTERED    ([TaxRegionAddressId] asc))

that has an identity as a clustered index.

however, as it is queried mostly according to these 2 columns:

[CountryCode]         VARCHAR(2)                       NOT NULL,
[AddressFormatId]     INT                              NOT NULL,

I thought about creating a clustered index on them.

they are pretty small, but they are not unique. should I include the id column in the cluster index so that I can make it unique?

Yes or No and Why?

This is the main query I would like to go faster:

ALTER PROCEDURE [dbo].[udpProductTaxRateGet]
(
    @itemNo varchar ( 20 ),
    @calculateDate datetime,
    @addressLine1 nvarchar( 50 ),
    @addressLine2 nvarchar( 50 ),
    @addressLine3 nvarchar( 50 ),
    @addressLine4 nvarchar( 50 ),
    @addressLine5 nvarchar( 50 ),
    @addressLine6 nvarchar( 50 ),
    @postalCode nvarchar( 20 ),
    @countryCode varchar( 2 ),
    @addressFormatID int
)
WITH EXECUTE AS 'webUserWithRW'
AS
--see Bocss2.dbo.[fnGetProductTax] for equivalent logic and comments in Bocss
DECLARE @Addresses TABLE (TaxRegionId int NOT NULL)

INSERT INTO @Addresses(TaxRegionId)
SELECT  DISTINCT TaxRegionId
FROM    dbo.[ShipTaxAddress]
WHERE   [CountryCode] = @countryCode
AND     [AddressFormatID] = @addressFormatID
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine1]), ISNULL(@addressLine1, '')) = ISNULL(@addressLine1, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine2]), ISNULL(@addressLine2, '')) = ISNULL(@addressLine2, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine3]), ISNULL(@addressLine3, '')) = ISNULL(@addressLine3, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine4]), ISNULL(@addressLine4, '')) = ISNULL(@addressLine4, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine5]), ISNULL(@addressLine5, '')) = ISNULL(@addressLine5, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine6]), ISNULL(@addressLine6, '')) = ISNULL(@addressLine6, '')
AND     @postalcode Like ISNULL ( CONVERT(nvarchar(20),[MatchPostalCode]), @postalcode)




SELECT DISTINCT ISNULL(pst.TaxCode, '') as TaxCode
     , ISNULL(pst.TaxRate, 0) as TaxRate
FROM    dbo.[ProductShipTax] pst
        INNER JOIN
        @Addresses a
            ON pst.TaxRegionId = a.TaxRegionId
WHERE   pst.[ItemNo] = @itemNo
AND     @calculateDate BETWEEN pst.[DateFrom] AND pst.[DateTo]

GO

This is how I improved this procedure:

I have created the following indexes:

    CREATE INDEX IDX_ShipTaxAddress_ShipTaxAddress
    ON dbo.[ShipTaxAddress] (CountryCode,
                             AddressFormatID,
                             MatchPostalCode)
    INCLUDE (TaxRegionId,
             [MatchAddressLine1],
             [MatchAddressLine2],
             [MatchAddressLine3],
             [MatchAddressLine4],
             [MatchAddressLine5],
             [MatchAddressLine6])
    GO

     CREATE NONCLUSTERED INDEX IX_ProductShipTax_ITemNo_DateFrom_DateTo 
     ON [dbo].[ProductShipTax] (  [ItemNo] ASC  
                                , [DateFrom] ASC  
                                , [DateTo] ASC  )   
    INCLUDE (TaxRegionId ,TaxCode,TaxRate)
    WITH (DROP_EXISTING=ON)

I have changed the relevant columns in the table, from VARCHAR to NVARCHAR, in order to eliminate the need of conversion.
the Table became like this:

IF OBJECT_ID('[dbo].[ShipTaxAddress]') IS NOT NULL 
DROP TABLE [dbo].[ShipTaxAddress] 
GO
CREATE TABLE [dbo].[ShipTaxAddress] ( 
[TaxRegionAddressId]  INT                              NOT NULL,
[TaxRegionId]         INT                              NOT NULL,
[CountryCode]         VARCHAR(2)                       NOT NULL,
[AddressFormatId]     INT                              NOT NULL,
[MatchAddressLine1]   NVARCHAR(50)                         NULL,
[MatchAddressLine2]   NVARCHAR(50)                         NULL,
[MatchAddressLine3]   NVARCHAR(50)                         NULL,
[MatchAddressLine4]   NVARCHAR(50)                         NULL,
[MatchAddressLine5]   NVARCHAR(50)                         NULL,
[MatchAddressLine6]   NVARCHAR(50)                         NULL,
[MatchPostalCode]     NVARCHAR(20)                         NULL,
CONSTRAINT   [PK_ShipTaxAddress]  
PRIMARY KEY NONCLUSTERED ([TaxRegionAddressId] asc))
GO

I have changed the procedure:

ALTER PROCEDURE [DenormV2].[udpProductTaxRateGet]
(
    @itemNo varchar ( 20 ),
    @calculateDate datetime,
    @addressLine1 nvarchar( 50 ),
    @addressLine2 nvarchar( 50 ),
    @addressLine3 nvarchar( 50 ),
    @addressLine4 nvarchar( 50 ),
    @addressLine5 nvarchar( 50 ),
    @addressLine6 nvarchar( 50 ),
    @postalCode nvarchar( 20 ),
    @countryCode varchar( 2 ),
    @addressFormatID int
)
WITH EXECUTE AS 'webUserWithRW'
AS
--see Bocss2.dbo.[fnGetProductTax] for equivalent logic and comments in Bocss


SELECT @postalcode    = CASE WHEN @postalcode   = N'' THEN NULL ELSE @postalcode   END
SELECT @addressLine1  = CASE WHEN @addressLine1 = N'' THEN NULL ELSE @addressLine1 END
SELECT @addressLine2  = CASE WHEN @addressLine2 = N'' THEN NULL ELSE @addressLine2 END
SELECT @addressLine3  = CASE WHEN @addressLine3 = N'' THEN NULL ELSE @addressLine3 END
SELECT @addressLine4  = CASE WHEN @addressLine4 = N'' THEN NULL ELSE @addressLine4 END
SELECT @addressLine5  = CASE WHEN @addressLine5 = N'' THEN NULL ELSE @addressLine5 END
SELECT @addressLine6  = CASE WHEN @addressLine6 = N'' THEN NULL ELSE @addressLine6 END



SELECT TOP 1  ISNULL(pst.TaxCode, '') as TaxCode
            , ISNULL(pst.TaxRate, 0) as TaxRate
FROM    dbo.[ProductShipTax] pst
WHERE EXISTS (

    SELECT  TaxRegionId
    FROM    dbo.[ShipTaxAddress]
    WHERE   [CountryCode] = @countryCode
    AND     [AddressFormatID] = @addressFormatID

    AND ([MatchAddressLine1] = @AddressLine1 OR ([MatchAddressLine1] IS NULL AND @AddressLine1 IS NULL) )
    AND ([MatchAddressLine2] = @AddressLine2 OR ([MatchAddressLine2] IS NULL AND @AddressLine2 IS NULL) )
    AND ([MatchAddressLine3] = @AddressLine3 OR ([MatchAddressLine3] IS NULL AND @AddressLine3 IS NULL) )
    AND ([MatchAddressLine4] = @AddressLine4 OR ([MatchAddressLine4] IS NULL AND @AddressLine4 IS NULL) )
    AND ([MatchAddressLine5] = @AddressLine5 OR ([MatchAddressLine5] IS NULL AND @AddressLine5 IS NULL) )
    AND ([MatchAddressLine6] = @AddressLine6 OR ([MatchAddressLine6] IS NULL AND @AddressLine6 IS NULL) )

    AND (@postalcode = [MatchPostalCode]     OR ([MatchPostalCode]   IS NULL AND @postalcode   IS NULL) )
    AND  TaxRegionId = pst.TaxRegionId     

)
AND pst.[ItemNo] = @itemNo
AND @calculateDate BETWEEN pst.[DateFrom] AND pst.[DateTo]

GO

when comparing the following:

USE US16HSMMProduct_ORIGINAL
GO
exec dbo.udpProductTaxRateGet 
        @itemNo=N'31997299',
        @calculateDate='Aug  8 2016  1:01:46:760PM',
        @addressLine1=N'',
        @addressLine2=N'',
        @addressLine3=N'',
        @addressLine4=N'',
        @addressLine5=N'',
        @addressLine6=N'FL',
        @postalcode=N'',
        @countryCode=N'US',
        @addressFormatID=2
go
USE US16HSMMProduct_AFTER_CHANGES
GO
exec DenormV2.udpProductTaxRateGet
        @itemNo=N'31997299',
        @calculateDate='Aug  8 2016  1:01:46:760PM',
        @addressLine1=N'',
        @addressLine2=N'',
        @addressLine3=N'',
        @addressLine4=N'',
        @addressLine5=N'',
        @addressLine6=N'FL',
        @postalcode=N'',
        @countryCode=N'US',
        @addressFormatID=2
go

we get this:
enter image description here

here is the execution plan of the old procedure

here is the execution plan of the new procedure

Best Answer

If you were to use those two columns as the clustered index, first have to change the PK to non-clustered as BlackStar mentioned, and they are not unique, then SQL Server will add a uniquifier to make it unique. That uniquifier is 4 bytes and used only when necessary. You could instead add the identity column to the end of that index. The identity column is defined as int which is 4 bytes, so there wouldn't be a size difference between the two versions of the index.

EDIT now that the main query has been provided.

I would remove the table variable if you want it to go faster. Do it all in one query. You can easily use a CTE instead.

Like this:

WITH a
AS
(
SELECT  DISTINCT TaxRegionId
FROM    dbo.[ShipTaxAddress]
WHERE   [CountryCode] = @countryCode
AND     [AddressFormatID] = @addressFormatID
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine1]), ISNULL(@addressLine1, '')) = ISNULL(@addressLine1, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine2]), ISNULL(@addressLine2, '')) = ISNULL(@addressLine2, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine3]), ISNULL(@addressLine3, '')) = ISNULL(@addressLine3, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine4]), ISNULL(@addressLine4, '')) = ISNULL(@addressLine4, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine5]), ISNULL(@addressLine5, '')) = ISNULL(@addressLine5, '')
AND     ISNULL (CONVERT(nvarchar(50),[MatchAddressLine6]), ISNULL(@addressLine6, '')) = ISNULL(@addressLine6, '')
AND     @postalcode Like ISNULL ( CONVERT(nvarchar(20),[MatchPostalCode]), @postalcode))

SELECT DISTINCT ISNULL(pst.TaxCode, '') as TaxCode
     , ISNULL(pst.TaxRate, 0) as TaxRate
FROM    dbo.[ProductShipTax] pst
        INNER JOIN
        a
            ON pst.TaxRegionId = a.TaxRegionId
WHERE   pst.[ItemNo] = @itemNo
AND     @calculateDate BETWEEN pst.[DateFrom] AND pst.[DateTo]