I am running this syntax below on roughly 30,000 records. The syntax is run 4 – 5 times a day and takes anywhere between 5 and 10 minutes to execute. Below is sample DDL with a subset of data (not the 30,000 records) and I can upload my XML if someone will let me know how so you can see query execution.
If any one sees ways of optimization to reduce that lengthy time to return the data I will greatly appreciate it.
Declare @NumOfLeads Table
(
clientname varchar(500)
,numoflicensesrequested int
,calldate date
)
Declare @PurchasedLicenses Table
(
clientname varchar(500)
,numoflicensespurchased int
,purchasedate date
)
Insert Into @NumOfLeads (clientname, numoflicensesrequested, calldate) Values
('Client A', 10, '2017-01-01'), ('Client B', 5, '2017-01-02')
,('Client C', 7, '2017-01-01'), ('Client D', 12, '2017-01-03')
Insert Into @PurchasedLicenses (clientname, numoflicensespurchased, purchasedate) Values
('Client A', 10, '2017-01-10'), ('Client C', 5, '2017-01-15'), ('Client E', 10, '2017-01-15')
,('Client F', 11, '2017-01-15'), ('Client G', 5, '2017-01-22')
Select clientname, numoflicensesrequested
INTO ProdData
FROM @NumOfLeads
WHERE calldate BETWEEN '2017-01-01' AND '2017-01-31'
ALTER TABLE ProdData
Add numlicensespurchased int NOT NULL DEFAULT(0)
MERGE INTO ProdData pd
USING (
SELECT clientname, numoflicensespurchased as CNT
FROM @PurchasedLicenses
WHERE purchasedate BETWEEN '2017-01-01' AND '2017-01-31'
) pl
ON pd.clientname = pl.clientname
WHEN MATCHED THEN
UPDATE SET pd.numlicensespurchased = pl.CNT
WHEN NOT MATCHED BY TARGET THEN
INSERT (clientname, numlicensespurchased, numoflicensesrequested)
VALUES (pl.clientname, pl.CNT, '0');
Select * FROM ProdData order by clientname asc
EDIT
SQL XML Plan is here
https://www.brentozar.com/pastetheplan/?id=SJ16cKXkZ
Best Answer
It looks like for your final result set you want one row for each distinct value of
clientname
in@NumOfLeads
and@PurchasedLicenses
. If the client is in both tables then populate thenumoflicensesrequested
andnumlicensespurchased
columns. If not then populate the one column that you have. To me, that business logic sounds straightforward enough that you don't need to split it up into multiple queries. Sample data:This query returns the same results as yours:
I believe that the query could also be written with a
FULL OUTER JOIN
, but I'm not personally a fan of that syntax.A few things worth mentioning:
Don't use table variables unless you need the functionality that they provide. Table variables don't have statistics so they can cause poor query performance if you use them in the wrong way. Personally, I only consider table variables if I need to persist data through a transaction rollback or if I have code that will run thousands of times per second (or more).
The fact that you're updating by
clientname
implies that it's the primary key of the data. Why not define it as the primary key?For complex queries you can see performance gains by splitting them up into multiple steps by saving intermediate results to temp tables. I would only try that if I couldn't get good enough performance with a single query. For this data and business logic I think that you can just write a single
SELECT
query. For other queries which are too complex, try doing inserts into temp tables. That will give you minimal logging and other benefits. Avoid doing updates (when practical), and avoid using MERGE unless you really need its functionality.