I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.
At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.
When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.
I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] )
is going to take too much resources. How can I make this run faster:
DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)
I have indexes on all the hashUFI
fields and also on all the pairs (rangeStart, rangeEnd)
.
This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.
Best Answer
Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +
this line is different
it might help