Sql-server – Query speed optimization

optimizationperformancequery-performancesql server

I have several tables that I'm trying to combine with the below query. I'm using the following tables:

  • LoanOrigination: This contains loan characteristics such as asset value, loan term, etc. There is a single unique observation per loan. This table is indexed on LOAN_ID,SUB_SAMPLE and COLLATERAL_TYPE.
  • LoanPerformance: This contains the performance of all loans in LoanOrigination. Each row is a unique MONTHLY_REPORTING_PERIOD and LOAN_ID combination, and the table has been indexed on both.
  • CollateralData: This contains historical values based on the COLLATERAL_TYPE. The purpose of this is to estimate what loan's current value to the collateral.

The purpose of the query below is to combine these tables so that each row contains loan characteristics along with the delinquency statuses in the current and following month. However, the query is extremely slow. Is there anything that can be done to speed it up?

with

COLLATERAL_VALUES as (
    select
        COLLATERAL_TYPE,
        dateadd( day, 1-day(AsOfDate), AsOfDate) as ASOFDATE,
        Value as INDEX
    from LoanData.CollateralData
),

SAMPLE_LOANS as (
    select
        a.*,
        b.INDEX as INDEX_T0
    from LoanData.LoanOrigination a
    join COLLATERAL_VALUES b on b.ASOFDATE = a.ORIG_DATE and b.COLLATERAL_TYPE = a.COLLATERAL_TYPE
    where SUB_SAMPLE = 0
),

LOAN_STATE as (
    select
        a.LOAN_ID,
        MONTHLY_REPORTING_PERIOD AS CUR_DATE,
        CURRENT_ACTUAL_UPB as CUR_UPB,
        LOAN_AGE,
        cast(CURRENT_LOAN_DELINQUENCY as smallint) AS CUR_DLQ_STATUS
    from LoanData.LoanPerformance a
    where
        CURRENT_LOAN_DELINQUENCY <> 'XX' and
        exists ( select LOAN_ID from SAMPLE_LOANS )
),

LOAN_TRANSITION as (
    select
        c.*,
        a.CUR_DATE,
        a.CUR_DLQ_STATUS,
        a.CUR_UPB,
        a.LOAN_AGE,
        b.NEXT_DLQ_STATUS
    from LOAN_STATE a
    join (
        -- adding next state transition
        select
            LOAN_ID,
            DATEADD( month, -1, CUR_DATE ) as PRIOR_DATE,
            CUR_DLQ_STATUS as NEXT_DLQ_STATUS
        from LOAN_STATE
    ) b on a.LOAN_ID = b.LOAN_ID and a.CUR_DATE = b.PRIOR_DATE
    join SAMPLE_LOANS c on a.LOAN_ID = c.LOAN_ID
)

select
    a.*,
    CUR_UPB / ( ASSET_VALUE * ( b.INDEX / a.INDEX_T0 )) AS LTV
from LOAN_TRANSITION a
join COLLATERAL_VALUES b on a.CUR_DATE = b.ASOFDATE and a.COLLATERAL_TYPE = b.COLLATERAL_TYPE

Best Answer

You have a problem whenever you join using ASOFDATE , such as here:

on a.CUR_DATE = b.ASOFDATE

, because ASOFDATE is defined as dateadd( day, 1-day(AsOfDate), AsOfDate). To me this seems to mean “first of the month”, and hopefully there’s no time component involved.

So, I’d add a new column to the first CTE, called maybe AsOfDateOrig.

COLLATERAL_VALUES as (
select
    COLLATERAL_TYPE,
    dateadd( day, 1-day(AsOfDate), AsOfDate) as ASOFDATE,
    Value as INDEX
    ,AsOfDate as AsOfDateOrig
from LoanData.CollateralData
),

And then involve that in your join clause too.

on a.CUR_DATE = b.ASOFDATE
and b.AsOfDateOrig >= a.CUR_DATE and b.AsOfDateOrig < dateadd(month,1,a.CUR_DATE)

It might seem redundant to have both, but it allows the QO to use either method as the main Seek Predicate.

Now do the same for everywhere else you join using that.

I’d consider doing the same for PRIOR_DATE too, basically avoiding joins that involve an expression unless you have the inverse of it too.

Finally, look at your indexing strategy.

You’ll need an Index on LoanData.CollateralData (COLLATERAL_TYPE, AsOfDate) INCLUDE (Value), for example, and similar indexes on the other tables.