Sql-server – Limit the number of results but show how much would be returned

countperformanceperformance-tuningsql servert-sql

I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.

At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.

When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.

I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:

DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
    FROM ( #tempDC dc WITH (NOLOCK)
            JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI] 
                AND dc.iId IN ( '', u.iId ) )
        LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
            AND #rlog.iId IN ( '',  u.iId )
            AND #rLog.docId = dc.docId
    WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 
        AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)

I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).

This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.

Best Answer

Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )
  FROM #tempDC dc WITH (NOLOCK)
  JOIN #tempUID u WITH (NOLOCK) 
        ON dc.[hashUFI] = u.[hashUFI] 
       AND dc.iId IN ( '', u.iId ) 
       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 
       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
  -- try without the left join and see if the number is different
  LEFT JOIN #rLog 
        ON dc.[hashUFI] = #rLog.[hashUFI]
       AND dc.docId     = #rLog.docId 
       AND #rlog.iId IN ( '',  u.iId )