SQL Server – How to Optimize This SQL Server Query for Better Performance

performancequery-performancesql serversql-server-2008

Anyone have any suggestions for how to optimize this query?
Running SQL Server 2008..

The object is to look at accounts that were created yesterday, compare their email addresses and phone numbers to existing accounts to identify if it is a duplicate account to one already in the system.

SELECT DISTINCT Customer.CustId
    , Customer.PriceClassID
    , Customer.Name
    , CustomerEDI.S4Future01 as First
    , CustomerEDI.S4Future02 as Last
    , Addr1
    , Addr2
    , City
    , State
    , Zip
    , Country
    , Customer.emailAddr
    , Customer.Phone
    , lastorder
    , CONVERT(varchar(8000), snotetext)  
FROM Customer
WHERE 
    EMailAddr IN ( 
        SELECT EmailAddr 
        FROM Customer 
        WHERE EmailAddr != '' 
        GROUP BY EmailAddr 
        HAVING COUNT(*) > 1) 
        AND EMailAddr IN ( 
            SELECT EMailAddr 
            FROM Customer 
            WHERE Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
            ) 
    OR Phone IN ( 
        SELECT Phone 
        FROM Customer 
        WHERE Phone != '' 
        GROUP BY Phone 
        HAVING COUNT(*) > 1) 
        AND Phone IN ( 
            SELECT Phone 
            FROM Customer 
            WHERE Crtd_DateTime >= DATEADD(day, -1, CAST(GETDATE() AS DATE))
            )
ORDER BY EMailAddr
    , Phone ASC
    , Name
    , Last
    , First ASC;

Best Answer

There are multiple things to look at to improve query performance.

  • Understand your system load
  • Volume of data you are expecting to be in the tables mentioned in query
  • Volume of data you are going to fetch from tables mentioned in query
  • Indexes applied on tables mentioned in query
  • Structure of the query

I can suggest you improvements related to only query structure as I am not fully aware of your system. I am assuming that you have proper indexes applied on your tables.

After looking at your query I can suggest you following things,

  • You dont need DISTINCT clause here as you are selecting Customer.CustId which I assume to be a primary key of Customer table & its always going to be unique which inter makes related row always unique in result set.
  • You are using too many subqueries which I dont think is a good practice. Try to use SQL joins as much as you can. Plus I think you can merge some of these queries. For example, you are selecting EmailAddr from Customer having count greater than 1 & created after yesterday. You can combine both these conditions in one single query as following.
    SELECT 
        [EmailAddr] AS [EmailAddr] 
    FROM 
        [Customer] AS [Customer]
    WHERE 
        [Customer].[EmailAddr] != '' 
        AND [Customer].[Crtd_DateTime] >= DATEADD(day, -1, CAST(GETDATE() AS DATE)
    GROUP BY 
        [Customer].[EmailAddr]
    HAVING COUNT(*) > 1

same thing with phone also

SELECT 
    [Phone] AS [Phone]
FROM 
    [Customer] AS [Customer]
WHERE 
    [Customer].[Phone] != '' 
    AND [Customer].[Crtd_DateTime] >= DATEADD(day, -1, CAST(GETDATE() AS DATE)
GROUP BY 
    [Customer].[Phone]
HAVING COUNT(*) > 1

You can use CTE here. Define 2 CTEs one for email data & one for phone data. Then you can do inner join with Customer table on email & phone to get correct data. (check my query given below to get better picture)

  • Try avoiding functions in your where clause. While querying for email & phone data you are checking for yesterdays date also. Try to store this value in some sql variable first & then use that variable value to compare. example,
    DECLARE @dateTimeToCheck DATETIME
    set @dateTimeToCheck = DATEADD(day, -1, CAST(GETDATE() AS DATE)

I would also like to suggest following things, - Try to write simple queries. Divide them into logical unites. This will help you to understand of complex queries easily & it will help you in query optimization also. - Use alias names for columns you are selecting. This will not improve performance but it will definitely improve readability.

Here is my improved version of your query.

DECLARE @dateTimeToCheck DATETIME
SET @dateTimeToCheck = DATEADD(day, -1, CAST(GETDATE() AS DATE)

;WITH cteEmail AS (
    SELECT 
        [EmailAddr] = [Customer].[EmailAddr] 
    FROM 
        [Customer] AS [Customer] WITH(NOLOCK)
    WHERE 
        [Customer].[EmailAddr] != '' 
        AND [Customer].[Crtd_DateTime] >= @dateTimeToCheck
    GROUP BY 
        [Customer].[EmailAddr]
    HAVING COUNT(*) > 1
)

;WITH ctePhone AS (
    SELECT 
        [Phone] = [Customer].[Phone]
    FROM 
        [Customer] AS [Customer]
    WHERE 
        [Customer].[Phone] != '' 
        AND [Customer].[Crtd_DateTime] >= @dateTimeToCheck
    GROUP BY 
        [Customer].[Phone]
    HAVING COUNT(*) > 1
)

SELECT 
    [CustomerId] = [Customer].CustId,
    [PriceClassID] = [Customer].PriceClassID,
    [Name] = [Customer].Name ,
    [First] = [CustomerEDI].[S4Future01],
    [Last] = [CustomerEDI].[S4Future02],
    [Address1] = [Customer].[Addr1],
    [Address2] = [Customer].[Addr2],
    [City] = [Customer].[City],
    [State] = [Customer].[State],
    [Zip] = [Customer].[Zip],
    [Country] = [Customer].[Country],
    [CustomerEmail] = [Customer].[emailAddr],
    [CustomrePhone] = [Customer].[Phone],
    [LastOrder] = [Customer].[lastorder],
    [snotetext] = CONVERT(varchar(8000), [Customer].[snotetext])
FROM 
    [Customer] AS [Customer] 
INNER JOIN 
    [cteEmail] AS [CteEmail] ON [Customer].EmailAddr = [cteEmail].[EmailAddr]
INNER JOIN 
    [ctePhone] AS [CtePhone] on [Customer].Phone = [ctePhone].[Phone]
ORDER BY 
    [Customer].[EMailAddr]
    , [Customer].[Phone] ASC
    , [Customer].[Name]
    , [Customer].[Last]
    , [Customer].[First] ASC;

I hope it helps.