Sql-server – conversion of a non correlated sub query to a join statement

performancequery-performancesql serversql-server-2000subquery

Is it possible to get rid of the sub query and convert the entire query into a JOIN query?

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance
WHERE  attendance = 'ABSENT'
GROUP  BY student_id
HAVING COUNT(attendance) = (SELECT max_allowed
                            FROM   configurations);

I tried the following:

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance,
       configurations c
WHERE  attendance = 'ABSENT'
GROUP  BY student_id
HAVING COUNT(attendance) = c.max_allowed;

and got this error:

Column 'c.max_allowed' is invalid in the HAVING clause because
it is not contained in either an aggregate function or the GROUP BY
clause.

I want to do this for to a possible increase in query speed in case of thousands of records. Is my concern even valid?

Environment: SQL Server 2000

Best Answer

Your initial query shows that configurations must have no more than one row (or it would fail). You could rewrite this as a join using

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance
       CROSS JOIN configurations c
WHERE  attendance = 'ABSENT'
GROUP  BY student_id
HAVING COUNT(attendance) = MAX(c.max_allowed);

Or as

SELECT student_id,
       COUNT(attendance)
FROM   student_attendance CROSS JOIN
       configurations c
WHERE  attendance = 'ABSENT'
GROUP  BY student_id, c.max_allowed
HAVING COUNT(attendance) = c.max_allowed;

Or as

SELECT t.*
FROM   (SELECT student_id,
               COUNT(attendance) AS count_attendance
        FROM   student_attendance
        WHERE  attendance = 'ABSENT'
        GROUP  BY student_id) AS t
       JOIN configuration AS c
         ON t.count_attendance = c.max_allowed;

But I don't see any benefit of doing so. The original query is clearer and likely to be equally or more efficient.

The execution plan I see is

The select against configurations is executed once - then the rest of the query is run as an independent sub tree with the final elimination of aggregate rows not matching the HAVING happening in the INNER JOIN.

To optimise this query you should consider creating an index on attendance, student_id. This will allow rows matching the attendance = 'ABSENT' predicate to be efficiently identified. And they will be ordered so can be processed by the stream aggregate without needing a sort.

One other option would be to pre-calculate the aggregation using an indexed view.

Related Solutions

Sql-server – Error in code in SQL Server

In most cases HAVING should only be used in a query with a GROUP BY. From BOL:

Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.

You don't have any aggregates so HAVING isn't the right choice. You probably want to continue to use your WHERE clause:

SELECT P.Data, Nm.Nome, epp.PosicaoFinal
FROM Prova P, Piloto Nm, EquipaParticipaProva epp
WHERE epp.PosicaoFinal = (SELECT PosicaoFinal FROM EquipaParticipaProva where PosicaoFinal = 1)
AND Nm.Nome ='AsdoVolante' AND P.Data BETWEEN '2017-01-01' AND '2017-06-30';

Below is a similar query that can use HAVING (although it doesn't really make sense). Note the presence of the GROUP BY:

SELECT P.Data, Nm.Nome, epp.PosicaoFinal
FROM Prova P, Piloto Nm, EquipaParticipaProva epp
WHERE epp.PosicaoFinal = (SELECT PosicaoFinal FROM EquipaParticipaProva where PosicaoFinal = 1)
GROUP BY P.Data, Nm.Nome, epp.PosicaoFinal
HAVING Nm.Nome ='AsdoVolante' AND P.Data BETWEEN '2017-01-01' AND '2017-06-30';

In terms of getting the right results, the query that you have doesn't have join clauses. You're doing a cross join between Prova, Piloto, and EquipaParticipaProva.

Perhaps this is closer to what you wanted:

SELECT P.Data, Nm.Nome, epp.PosicaoFinal
FROM EquipaParticipaProva epp
INNER JOIN Prova P ON p.IdProva = epp.IdProva
INNER JOIN Piloto Nm ON Nm.IdPiloto = epp.IdPiloto
WHERE epp.PosicaoFinal = (SELECT PosicaoFinal FROM EquipaParticipaProva where PosicaoFinal = 1)
AND Nm.Nome ='AsdoVolante' AND P.Data BETWEEN '2017-01-01' AND '2017-06-30';

Sql-server – Using GROUP BY and ORDER BY in the query works in MySQL but not in SQL Server

To make this run all of the columns in the select have to be in the group by portion of the query.

Select B.BBarcode , Name, C.Country, Manufacturer,W.id,
 Packtype, B.TovarId, sum(WCount) as count, Price, Series, ExpDate
  from Warehouse W, Tovar T, Barcodes B, 
  Countries C 
  where C.CId=T.Country and
   B.WarehouseId=W.id and WCount>0 and 
   T.id=W.TovarId +'=@param' 
   GROUP BY B.BBarcode , Name, C.Country, Manufacturer,W.id,
 Packtype, B.TovarId, Price, Series, ExpDate
   ORDER BY Name

However I’m not sure the query will run with the where syntax you have, specifically the T.id=W.TovarId +'=@param' portion.

That can be refactored to be either

T.id=W.TovarId and T.id = @param

T.id=W.TovarId and W.TovarId= @param

Version that should execute

Select B.BBarcode , Name, C.Country, Manufacturer,W.id,
 Packtype, B.TovarId, sum(WCount) as count, Price, Series, ExpDate
  from Warehouse W, Tovar T, Barcodes B, 
  Countries C 
  where C.CId=T.Country and
   B.WarehouseId=W.id and WCount>0 and 
   T.id=W.TovarId and T.Id = @param
   GROUP BY B.BBarcode , Name, C.Country, Manufacturer,W.id,
 Packtype, B.TovarId, Price, Series, ExpDate
   ORDER BY Name

I would also look at some documents on how sql server does joins and the group by. While I believe sql server will do the joins correctly for an inner join between the tables based on how you’ve written them, the syntax structure they have make it a little more clear what is joining to what as well as the options to do other types of joins

Best Answer

Related Solutions

Sql-server – Error in code in SQL Server

Sql-server – Using GROUP BY and ORDER BY in the query works in MySQL but not in SQL Server

Related Question