Sql-server – conversion of a non correlated sub query to a join statement

performancequery-performancesql serversql-server-2000subquery

Is it possible to get rid of the sub query and convert the entire query into a JOIN query?

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance
WHERE  attendance = 'ABSENT'
GROUP  BY student_id
HAVING COUNT(attendance) = (SELECT max_allowed
                            FROM   configurations); 

I tried the following:

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance,
       configurations c
WHERE  attendance = 'ABSENT'
GROUP  BY student_id
HAVING COUNT(attendance) = c.max_allowed; 

and got this error:

Column 'c.max_allowed' is invalid in the HAVING clause because
it is not contained in either an aggregate function or the GROUP BY
clause.

I want to do this for to a possible increase in query speed in case of thousands of records. Is my concern even valid?

Environment: SQL Server 2000

Best Answer

Your initial query shows that configurations must have no more than one row (or it would fail). You could rewrite this as a join using

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance
       CROSS JOIN configurations c
WHERE  attendance = 'ABSENT'
GROUP  BY student_id
HAVING COUNT(attendance) = MAX(c.max_allowed); 

Or as

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance CROSS JOIN
       configurations c
WHERE  attendance = 'ABSENT'
GROUP  BY student_id, c.max_allowed
HAVING COUNT(attendance) = c.max_allowed; 

Or as

SELECT t.*
FROM   (SELECT student_id,
               COUNT(attendance) AS count_attendance
        FROM   student_attendance
        WHERE  attendance = 'ABSENT'
        GROUP  BY student_id) AS t
       JOIN configuration AS c
         ON t.count_attendance = c.max_allowed; 

But I don't see any benefit of doing so. The original query is clearer and likely to be equally or more efficient.

The execution plan I see is

enter image description here

The select against configurations is executed once - then the rest of the query is run as an independent sub tree with the final elimination of aggregate rows not matching the HAVING happening in the INNER JOIN.

To optimise this query you should consider creating an index on attendance, student_id. This will allow rows matching the attendance = 'ABSENT' predicate to be efficiently identified. And they will be ordered so can be processed by the stream aggregate without needing a sort.

One other option would be to pre-calculate the aggregation using an indexed view.