Sql-server – Same Query takes 0 seconds on Server A, 7.5 minutes on Server B (same db/hardware/config)

performancequery-performancesql-server-2008-r2

I have a query (sql below) that is nearly instantaneous on 1 server, but takes 7.5 minutes on another virtually (key point) identical server. The estimated query execution plans have one obvious difference; the plan on the slower performing server contains an "Index Spool (Eager Spool)" where the plan on the faster performing server does not.

Server F (fast) – A snap of Server S that is refreshed monthly. It is used by only a handful of users for intensive analysis and has been separated so day to day users don't have to suffer the performance hits.

Server S (slow) – Receives daily data updates and is generally used for ad-hoc data mining and a small amount of user workflow data updates. There is no user activity or process running when this sql takes 7.5 minutes to run.

The hardware and configurations are identical. One difference is that the indexes on Server B are only rebuilt/reorganized once per week. They are rebuilt on Server F just after restoring the snap. I don't think this is a problem though because on Server S all index fragmentation on the large IB Charge table is < 5%.

The sql may be complex or ugly, but it runs quickly on 1 server. The index fragmentation does not appear to be much higher on the slower server than the fragmentation on the slower server.

What is causing such a drastic difference? I'm struggling with deciding what to investigate next.

select 
row_number() over (order by t1.service_date) as rank,
sum(t1.total_charge) total_charge,
sum(t1.service_units) service_units,
t1.service_date service_date,
(   select sum(x.total_charge)
    from ib_charge x
            inner join ib_header y on x.ib_header_fk = y.ib_header_pk
    where x.patient_account_fk = ### 
            and x.record_type = 1
            and x.voided = 0
            and y.replaced = 0
            and x.revenue_code not in (274, 275, 276, 278)
        and (x.service_date is null or x.service_date <= t1.service_date)
    ) running_total_by_day
from
ib_charge t1
inner join ib_header t2 on t1.ib_header_fk = t2.ib_header_pk
where
t1.patient_account_fk = ###
and t1.record_type = 1
and t1.voided = 0
and t2.replaced = 0
and t1.revenue_code not in (274, 275, 276, 278) 
group by t1.service_date
order by t1.service_date

Execution Plan Server F
Server_F.sqlplan

Execution Plan Server S
Server_S.sqlplan

SERVER F statistics io

(55 row(s) affected)

Table 'IB_Header'. Scan count 0, logical reads 60611, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'IB_Charge'. Scan count 56, logical reads 121156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SERVER S statistics io

(55 row(s) affected)

Table 'IB_Header'. Scan count 0, logical reads 60706, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 55, logical reads 384302241, physical reads 12, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'IB_Charge'. Scan count 2, logical reads 1686771, physical reads 4143, read-ahead reads 1684432, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

UPDATE >> I also put this question out as an office challenge, and on Monday someone congratulated me on having solved the problem – but I hadn't done anything! A weekly statistic refresh and index rebuild had run over the weekend, but the particular index stats on the tables involved don't look much different. On further investigation, I realized that depending on which particular KEY is used in the ### portion of the query the results will take either 0 seconds or 5-8 minutes. Just editing the KEY also causes the estimated execution plan to include/exclude the index spool which seems to be the primary delay in the query. Currently pursuing a more holistic investigation of the bpool.

Best Answer

This problem has been resolved, although not quite how I anticipated.

Performance did not change after flushing the buffer pool, the procedure cache, rebooting the server, or rebuilding the indexes. The lengthy index spool continued to appear sometimes and not others in the query plan.

The fix was to change the reference in the where clause from ib_charge.patient_account_fk = ### to ib_header.patient_account_fk (the itemized bill (IB) header table is much smaller than the ib_charge table). This resulted in the server ceasing to use the Index Spool in all cases, which was the cause of the performance hit.