Sql-server – Limit the number of results but show how much would be returned

countperformanceperformance-tuningsql servert-sql

I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.

At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.

When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.

I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:

DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
    FROM ( #tempDC dc WITH (NOLOCK)
            JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI] 
                AND dc.iId IN ( '', u.iId ) )
        LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
            AND #rlog.iId IN ( '',  u.iId )
            AND #rLog.docId = dc.docId
    WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 
        AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)

I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).

This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.

Best Answer

Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )
  FROM #tempDC dc WITH (NOLOCK)
  JOIN #tempUID u WITH (NOLOCK) 
        ON dc.[hashUFI] = u.[hashUFI] 
       AND dc.iId IN ( '', u.iId ) 
       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 
       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
  -- try without the left join and see if the number is different
  LEFT JOIN #rLog 
        ON dc.[hashUFI] = #rLog.[hashUFI]
       AND dc.docId     = #rLog.docId 
       AND #rlog.iId IN ( '',  u.iId )

Related Solutions

Sql-server – Execution Plan not matching Stored Proc

Your code says you are doing a LEFT OUTER JOIN, but are you really? Your very first WHERE clause (as well as each that follow) filters rows for inclusion from the outer table:

WHERE     (A.[Operator_Name] LIKE '%'+ @OName +'%')

This turns your LEFT OUTER JOIN into an INNER JOIN, whether you meant it or not. Perhaps you meant to move those filters to the ON clause.

Of course I would probably remove the non-sargeable MONTH() function from the JOIN, since this is going to force a complete scan on the remote table, and write it this way instead - also eliminating the join to your local months table:

DECLARE 
  @OName      VARCHAR(50) = 'John', 
  @Start_Date DATE = NULL,--'20120101', -- use safe date formats 
  @End_Date   DATE = '20121231'; -- use semi-colons

-- deal with NULLs here so the query is simpler:

SELECT @Start_Date = COALESCE(@Start_Date, '20010101'),
       @End_Date   = COALESCE(@End_Date, CURRENT_TIMESTAMP);

;WITH n(n) AS 
(
  -- get the # of months you need instead of relying on your Connector table:
  SELECT TOP (DATEDIFF(MONTH, @Start_Date, @End_Date)+1) Number
    FROM master..spt_values
    WHERE [type] = N'P' AND Number >= 0
    ORDER BY Number
), m(m) AS
(
  -- convert those to months:
  SELECT DATEADD(MONTH, n, DATEADD(MONTH, DATEDIFF(MONTH, 0, @Start_Date), 0))
  FROM n
)
SELECT 
  m.m, 
  MonthName = DATENAME(MONTH, m.m),
  [Widget Count] = COUNT(w.Widget_ID)
FROM m
LEFT OUTER JOIN [Server].[Database].dbo.Widget AS w -- meaningful alias
ON 
  w.Widget_Date >= m.m
  AND w.Widget_Date < DATEADD(MONTH, 1, m.m) -- open-ended date range query
  AND w.Operator_Name LIKE '%' + @OName + '%'
GROUP BY m.m
ORDER BY m.m;

Just ignore the m column in the output (you can't order by it without including it in the output, and it doesn't make sense to order by the name).

Another suggestion: don't use 'single quotes' as alias delimiters. Forms of this are deprecated and it also makes column alias look like string literals. Use [square brackets] instead.

Mysql – Counting number of visitors who did NOT log in

Do it with a LEFT JOIN

SELECT Count(DISTINCT accessid) AS c
FROM   accesslog a
LEFT JOIN
(SELECT accessid
 FROM   accesslog
 WHERE  DATE = '2014-01-21'
 AND userid > 0
 AND is_robot = 0
 AND is_admin = 0 
 GROUP BY accessid)x
ON a.accessid=x.accessid
WHERE  a.date = '2014-01-21'
   AND a.userid = 0
   AND a.is_robot = 0
   AND a.is_admin = 0
   AND a.url = 'www.example.com/'
   AND x.accessid IS NULL

Best Answer

Related Solutions

Sql-server – Execution Plan not matching Stored Proc

Mysql – Counting number of visitors who did NOT log in

Related Question