Could you please help me tune this query. It takes over half an hour to run! I want to identify repeat bookings (can only be identified using an annoying varchar field – email)
UPDATE #Repeats --a temp table
SET Repeats = (SELECT Count(col1)
FROM Booking b1 -- a massive table -- 8 million rows
WHERE email IN --a varchar(100) field
(SELECT Email
FROM Booking b2
WHERE DateDiff(day, b2.BookedDate, b1.BookedDate) > 1)
AND BookingStatusId != 7
AND #Repeats.BookedMonth = MOnth(BookedDate)
AND #Repeats.BookedYear = Year(BookedDate)
AND #Repeats.Code COLLATE Latin1_General_CI_AS = CONVERT(VARCHAR, b1.Code)
AND b1.BookedById = 5)
The table has a columnstore index that the plan is picking up.. Even then it is slow. If I drop the columnstore index, then the plan uses the nonclustered index on email. Thus there are two indexes on booking table. 1. columnstore index 2. nonclustered index on email.
I'm thinking along the lines of joins (instead of 'where email in') or using Window functions. Since I am not very well versed in these areas, I am a bit lost.
Best Answer
You should look at the Execution Plan and see where most of the cost is for the query. Without the results of the Execution Plan it is difficult to offer advice.
I suspect the query optimizer is using the best process possible for the query as it is structured. You might try breaking the query into smaller parts, like query the email field to get the results into a temp table first, then run your main query using the temp table of emails.
The might not make the query run faster, but this might expose information in the Execution Plan that will help you troubleshoot.
Another option is to use include to add columns to your non-clustered index to return all the values in the index scan to maybe speed up the process.