Sql-server – How to improve performance on a 3 table query

performancequery-performancesql serversql-server-2008-r2

I'm attempting to analyze a resource drawdown (the percentage of time individual units are being utilized during a reporting period, e.g. 2-weeks or 1-year). I need the aggregation down to the minute. To accomplish this, I'm querying two tables and a view, but it's taking approximately 1 hour per execution so I'm trying to improve its performance.

My Setup

View 1 holds my time intelligence for period reporting. It's based upon a customized calendar table that uses date as its primary level of aggregation. This is my primary filter for values, returning 14 rows for a bi-weekly reporting period.

Table 1 is a calendar table, with values to the minute. It has values for a period of 20 years, so there are in excess of 10.5 million records. This table provides the level of detail I need for my analysis.

Table 2 is a proprietary table from our RMS vendor and holds the resource utilization records necessary for the analysis. It resides in a separate database (on the same server) as Table 1 and View 1, and I can't make any changes to it. For a biweekly period, it will return approximately 350 rows.

My Problem
For a period of time defined through View 1, I need to return all of the rows from Table 1 left joined to Table 2.

My Query:

SELECT t1.calendar_dttm, t2.unit, t2.start_dttm, t2.end_dttm
FROM Table_1 t1
LEFT JOIN Table_2 t2 on  t1.calendar_dttm >= t2.start_dttm 
                     and t1.calendar_dttm <=t2.end_dttm
JOIN View_1 v1 on  t1.calendar_dttm >= v1.start_dttm 
               and t1.calendar_dttm <= v1.end_dttm
WHERE v1.period_diff = -1

This takes 24.5 mins to return 26855 rows, and eventually I'll need to run it for year (or more) long periods of time.

EDIT: As I look at the execution plan, I think I need to filter both tables for the reporting period before conducting the joins. Here's the execution plan:

Execution Plan

And here's View_1:

SELECT 
c1.[Date ID]
, c1.Date
, c1.DAY
, c1.DayofMonth
, c1.DayofWeek
, c1.Week
, c1.MONTH
, c1.YEAR
, c1.Shift
, c1.DayofPayperiod
, c1.DayofFLSA
, c1.PayPeriod
, c1.paydate
, c1.flsa_end
, FLOOR(((CONVERT(float,(c1.[Date]-GETDATE()))+(14-c1.[DayofPayPeriod]))/14)) period_diff
, DATEDIFF(yy,GETDATE(), c1.date) year_cal_diff
, DATEPART(dy,c1.date) doy
, DATEDIFF(d,GETDATE()+(14-(select DayofPayPeriod from Table_Calendar where date = CONVERT(date,GETDATE()))),date) day_offset
, CASE
    when DATEDIFF(d,GETDATE()+(14-(select DayofPayPeriod from Table_Calendar where date = CONVERT(date,GETDATE()))),date) < 0 
    then '-'
    else '+'
    end + right('0000'+ REPLACE(DATEDIFF(d,GETDATE()+(14-(select DayofPayPeriod from Table_Calendar where date = CONVERT(date,GETDATE()))),date),'-',''),4)
    day_offset_label
, CEILING(FLOOR(((CONVERT(float,(c1.[Date]-GETDATE()))+(14-c1.[DayofPayPeriod]))/14))/26) year_run_diff
, DATEADD(HH,7,c1.Date) shift_start,DATEADD(s,111599,c1.date) shift_end


FROM Table_Calendar c1

It takes static, custom date data from a table and then applies calculates dynamic period info based upon the date its run.

Best Answer

I see joins to v1 on start_dttm and end_dttm but dont see it in the view or the table listed.

Looking at your definition of your view, are you sure this is an issue with the join, or does the view itself have issues?

There are a few problems, and I will list them in order my brain saw them:

, DATEDIFF(d,GETDATE()+(14-(select DayofPayPeriod from Table_Calendar where date = CONVERT(date,GETDATE()))),date) day_offset , CASE when DATEDIFF(d,GETDATE()+(14-(select DayofPayPeriod from Table_Calendar where date = CONVERT(date,GETDATE()))),date) < 0 then '-' else '+' end + right('0000'+ REPLACE(DATEDIFF(d,GETDATE()+(14-(select DayofPayPeriod from Table_Calendar where date = CONVERT(date,GETDATE()))),date),'-',''),4) day_offset_label

These sections of the query indicate you might have RBAR behavior, eg the queries in the parenthesis may be running for each row that you need to return, as you add more data this will take longer and longer.

Of the columns you use that I can see: FLOOR(((CONVERT(float,(c1.[Date]-GETDATE()))+(14-c1.[DayofPayPeriod]))/14)) Uses a calculation which is not SARGable in anyway, and so that -1 comparison would need to be calculated for each row as well.

This may be a case where the execution plan's estimates vs actual rows would be off, you may want to download a tool such as SQL Sentry Plan Explorer (dont need to buy the pro version to easily compare things like this http://www.sqlsentry.com/products/plan-explorer/sql-server-query-view) which will highlight problem areas.

From what I can see of the plan (not much) it is giving me the impression that the planner cannot produce a useful plan and causing you to have a bad time.

My recommendations:

  • Compare like types so that you can use an index, or worst case make some sort of materialized view if you cant. See if you can unwrap some of those calculations and use less human readable/nice numbers to evaluate date offsets.
    • You may need to add an index or two to make this fly.
  • Consider using a numbers table to calculate offsets or values needed for far in the future, Why are numbers tables "invaluable"? so that you can use the power of the index to query this instead of churn your CPU each time you run this query.
  • Try to query some of this information in a SARGable fashion and then do the gross formatting work on a temp table or something that is your "final set", if this is still to slow you may want to consider doing this in some other programming paradigm.