SQL Server – How to Optimize Long Running Query

optimizationsql serversql-server-2008-r2t-sql

I am running this syntax below on roughly 30,000 records. The syntax is run 4 – 5 times a day and takes anywhere between 5 and 10 minutes to execute. Below is sample DDL with a subset of data (not the 30,000 records) and I can upload my XML if someone will let me know how so you can see query execution.

If any one sees ways of optimization to reduce that lengthy time to return the data I will greatly appreciate it.

Declare @NumOfLeads Table
(
    clientname varchar(500)
    ,numoflicensesrequested int
    ,calldate date
)
Declare @PurchasedLicenses Table
(
    clientname varchar(500)
    ,numoflicensespurchased int
    ,purchasedate date
)
Insert Into @NumOfLeads (clientname, numoflicensesrequested, calldate) Values
('Client A', 10, '2017-01-01'), ('Client B', 5, '2017-01-02')
,('Client C', 7, '2017-01-01'), ('Client D', 12, '2017-01-03')
Insert Into @PurchasedLicenses (clientname, numoflicensespurchased, purchasedate) Values
('Client A', 10, '2017-01-10'), ('Client C', 5, '2017-01-15'), ('Client E', 10, '2017-01-15')
,('Client F', 11, '2017-01-15'), ('Client G', 5, '2017-01-22')

Select clientname, numoflicensesrequested
INTO ProdData
FROM @NumOfLeads
WHERE calldate BETWEEN '2017-01-01' AND '2017-01-31'

ALTER TABLE ProdData
Add numlicensespurchased int NOT NULL DEFAULT(0)

MERGE INTO ProdData pd
USING (
    SELECT clientname, numoflicensespurchased as CNT
    FROM @PurchasedLicenses 
    WHERE purchasedate BETWEEN '2017-01-01' AND '2017-01-31'
) pl
    ON pd.clientname = pl.clientname
WHEN MATCHED THEN
    UPDATE SET pd.numlicensespurchased = pl.CNT
WHEN NOT MATCHED BY TARGET THEN
  INSERT (clientname, numlicensespurchased, numoflicensesrequested)
  VALUES (pl.clientname, pl.CNT, '0');

Select * FROM ProdData order by clientname asc

EDIT
SQL XML Plan is here
https://www.brentozar.com/pastetheplan/?id=SJ16cKXkZ

Best Answer

It looks like for your final result set you want one row for each distinct value of clientname in @NumOfLeads and @PurchasedLicenses. If the client is in both tables then populate the numoflicensesrequested and numlicensespurchased columns. If not then populate the one column that you have. To me, that business logic sounds straightforward enough that you don't need to split it up into multiple queries. Sample data:

CREATE TABLE #NumOfLeads
(
  clientname varchar(500)
, numoflicensesrequested int
, calldate date
, PRIMARY KEY (clientname)
);

CREATE TABLE #PurchasedLicenses
(
  clientname varchar(500)
, numoflicensespurchased int
, purchasedate date
, PRIMARY KEY (clientname)
)

Insert Into #NumOfLeads (clientname, numoflicensesrequested, calldate) Values
('Client A', 10, '2017-01-01'), ('Client B', 5, '2017-01-02')
,('Client C', 7, '2017-01-01'), ('Client D', 12, '2017-01-03');

Insert Into #PurchasedLicenses (clientname, numoflicensespurchased, purchasedate) Values
('Client A', 10, '2017-01-10'), ('Client C', 5, '2017-01-15'), ('Client E', 10, '2017-01-15')
,('Client F', 11, '2017-01-15'), ('Client G', 5, '2017-01-22');

This query returns the same results as yours:

SELECT
  nol.clientname
, nol.numoflicensesrequested
, COALESCE(pl.numoflicensespurchased, 0) AS numoflicensespurchased
FROM #NumOfLeads nol
LEFT OUTER JOIN #PurchasedLicenses pl ON nol.clientname = pl.clientname
    AND pl.purchasedate BETWEEN '2017-01-01' AND '2017-01-31'
WHERE nol.calldate BETWEEN '2017-01-01' AND '2017-01-31'

UNION ALL

SELECT 
  pl.clientname
, 0
, pl.numoflicensespurchased
FROM #PurchasedLicenses pl
WHERE pl.purchasedate BETWEEN '2017-01-01' AND '2017-01-31'
AND NOT EXISTS (
    SELECT 1
    FROM #NumOfLeads nol
    WHERE nol.clientname = pl.clientname
)
order by clientname asc;

I believe that the query could also be written with a FULL OUTER JOIN, but I'm not personally a fan of that syntax.

A few things worth mentioning:

Don't use table variables unless you need the functionality that they provide. Table variables don't have statistics so they can cause poor query performance if you use them in the wrong way. Personally, I only consider table variables if I need to persist data through a transaction rollback or if I have code that will run thousands of times per second (or more).

The fact that you're updating by clientname implies that it's the primary key of the data. Why not define it as the primary key?

For complex queries you can see performance gains by splitting them up into multiple steps by saving intermediate results to temp tables. I would only try that if I couldn't get good enough performance with a single query. For this data and business logic I think that you can just write a single SELECT query. For other queries which are too complex, try doing inserts into temp tables. That will give you minimal logging and other benefits. Avoid doing updates (when practical), and avoid using MERGE unless you really need its functionality.