Sql-server – which one is more efficient query

performancequeryquery-performancesql serversql-server-2005

I have a table called STUDENT

+-----------+-----------+-----------+---------------+
| StudentID | FirstName | LastName  | EnrollmenDate |
+-----------+-----------+-----------+---------------+
|         1 | x         | x         | x             |
|         2 | x         | x         | x             |
|         3 | x         | x         | x             |
+-----------+-----------+-----------+---------------+

Write a query that pulls the student who registered the last?

a) select top 1 * from STUDENTorder by EnrollmentDate Desc

b) select * from STUDENT where EnrollmentDate = (select
Max(EnrollmentDate) from STUDENT)

I tend to ask this question in interview. One candidate answered b) I was expecting a). Which one is better query?

Best Answer

Write a query that pulls the student who registered the last?

The second query does not necessarily do this, depending on the data type of the EnrolmentDate column and how granular the data is. If this column does not contain a time component, the query will return all students registered on the last day a student registered, which does not satisfy the question. If there is a time component, it's possible (but much less likely) that there will be multiple rows returned.

(Edit: Alex Kuznetsov correctly pointed out in the comments that the first query doesn't necessarily return the last enrolled student either when there are ties. It is, however, guaranteed to return a record in that event, instead of all records, which is normally satisfactory. I think my point was more that comparing the two queries is comparing apples to oranges, so to speak.)

In any event, if we assume all enrollment dates/times are unique, from what's given, the answer to the question isn't necessarily clear cut either. You would need to qualify for me what you mean by more efficient.

The first query will only scan once, but could potentially incur an expensive sort (you didn't say which indexes exist on the table, so I assume none). The latter query will do a scan to find the maximum, then do another scan to find all matching rows, which would possibly use less CPU, but more logical I/Os. It's entirely possible the second query would be less expensive overall (again, with no indexes available).

Having said all that, if I was to start doing performance tuning on this business operation, I would most certainly start with query (a).

Related Solutions

Sql-server – Should a table have a clustered index even if it doesn’t have appropriate fields for it

1) IF PlayerId is assigned with NEWSEQUENTIALID, you could consider that as the clustered index.

2) Otherwise, you can add an IDENTITY and make that clustered (questionable benefit, since all access will be through the PK you have already established).

3) Or you can leave it as a heap - with appropriate non-clustered indexes.

My order of preference would be 1, 3, 2 assuming you can't change the uniqueidentifier to an IDENTITY instead.

Can you explain why you are using uniqueidentifier in the first place? - that may have some bearing on this.

Sql-server – Need a way to query a table, and JOIN it with the TOP 1 related record from an other table

You can do this pretty easily with OUTER APPLY (if you're on 2005 or newer). Note that there may be better performing ways of achieving the result, such as using ROW_NUMBER() - check execution plans if in doubt. Also, SELECT * is lazy and inadvisable; I'm just doing it here for illustrative purposes, and because I don't know the real structure of the Heartbeats table.

SELECT
    dv.Name,
    hb.*
FROM [Devices] as dv
    OUTER APPLY (
        SELECT TOP 1 *
        FROM Heartbeats
        WHERE Heartbeats.DeviceID = dv.ID
        ORDER BY DateEntered DESC
    ) hb
WHERE ISNULL(hb.DateEntered, '1900-01-01T00:00') < '2013-03-04T00:00'

See Books Online for the finer points of OUTER APPLY vs. CROSS APPLY (it's much like OUTER JOIN vs. INNER JOIN). It was always such a pain doing queries like this in SQL Server 2000 where you didn't have OUTER/CROSS APPLY or the ROW_NUMBER() function.

Best Answer

Related Solutions

Sql-server – Should a table have a clustered index even if it doesn’t have appropriate fields for it

Sql-server – Need a way to query a table, and JOIN it with the TOP 1 related record from an other table

Related Question