Sql-server – which one is more efficient query

performancequeryquery-performancesql serversql-server-2005

I have a table called STUDENT

+-----------+-----------+-----------+---------------+
| StudentID | FirstName | LastName  | EnrollmenDate |
+-----------+-----------+-----------+---------------+
|         1 | x         | x         | x             |
|         2 | x         | x         | x             |
|         3 | x         | x         | x             |
+-----------+-----------+-----------+---------------+

Write a query that pulls the student who registered the last?

a) select top 1 * from STUDENTorder by EnrollmentDate Desc

or

b) select * from STUDENT where EnrollmentDate = (select
Max(EnrollmentDate) from STUDENT)

I tend to ask this question in interview. One candidate answered b) I was expecting a). Which one is better query?

Best Answer

Write a query that pulls the student who registered the last?

The second query does not necessarily do this, depending on the data type of the EnrolmentDate column and how granular the data is. If this column does not contain a time component, the query will return all students registered on the last day a student registered, which does not satisfy the question. If there is a time component, it's possible (but much less likely) that there will be multiple rows returned.

(Edit: Alex Kuznetsov correctly pointed out in the comments that the first query doesn't necessarily return the last enrolled student either when there are ties. It is, however, guaranteed to return a record in that event, instead of all records, which is normally satisfactory. I think my point was more that comparing the two queries is comparing apples to oranges, so to speak.)

In any event, if we assume all enrollment dates/times are unique, from what's given, the answer to the question isn't necessarily clear cut either. You would need to qualify for me what you mean by more efficient.

The first query will only scan once, but could potentially incur an expensive sort (you didn't say which indexes exist on the table, so I assume none). The latter query will do a scan to find the maximum, then do another scan to find all matching rows, which would possibly use less CPU, but more logical I/Os. It's entirely possible the second query would be less expensive overall (again, with no indexes available).

Having said all that, if I was to start doing performance tuning on this business operation, I would most certainly start with query (a).