Oracle 12c – Select Most Recent Inspection

dategreatest-n-per-grouporacleoracle-12c

I have a ROAD_INSPECTION table:

+----+------------------------+-----------+
| ID |          DATE          | CONDITION |
+----+------------------------+-----------+
|  1 | 01/01/2009             |        20 |
|  1 | 05/01/2013             |        16 |
|  1 | 04/29/2016 10:02:52 AM |        15 |
+----+------------------------+-----------+
|  2 | 01/01/2009             |         8 |
|  2 | 06/06/2012 9:55:13 AM  |         8 |
|  2 | 04/28/2015             |        11 |
+----+------------------------+-----------+
|  3 | 06/11/2012             |        10 |
|  3 | 04/21/2015             |        19 |
+----+------------------------+-----------+

What is the most efficient way to select the most recent inspection? The query would need to include the CONDITION column, despite the fact that it wouldn't group by cleanly:

+----+------------------------+-----------+
| ID |          DATE          | CONDITION |
+----+------------------------+-----------+
|  1 | 04/29/2016 10:02:52 AM |        15 |
+----+------------------------+-----------+
|  2 | 04/28/2015             |        11 |
+----+------------------------+-----------+
|  3 | 04/21/2015             |        19 |
+----+------------------------+-----------+

For bonus points: The DATE column has both DATE and DATE_TIME values in it. Should I be worried about this causing complications when querying the column?

Best Answer

SELECT ID,
       MAX( "DATE" ) AS "DATE",
       MAX( CONDITION ) KEEP ( DENSE_RANK LAST ORDER BY "DATE" ) AS condition
FROM   road_inspection
GROUP BY id

SELECT *
FROM   (
  SELECT r.*,
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY "DATE" DESC ) AS rn
  FROM   road_inspection r
)
WHERE  rn = 1;

The DATE column has both DATE and DATE_TIME values in it. Should I be worried about this causing complications when querying the column?

Oracle has no concept of a DATE_TIME data type. There is only DATE or TIMESTAMP and both of them have a HH24:MI:SS time component (TIMESTAMP also has fractional seconds). If a DATE value is set without a time component then oracle will default to midnight (00:00:00) of that day for the time component of the date.

Assuming your DATE column is actually stored as a DATE data type (and not as a VARCHAR2) then, no, there will not be any complications (with regards to selecting the rows with maximum values) from having time components as all your values will have a time component.

Related Solutions

SQL Server – How to Select All Most Recent Records

Your query includes a lot of tables and columns that aren't listed in your sample data. From this query you can join and pull in other things:

;WITH lastMessage AS
(
    SELECT Id, IdMain, IdReply, DatePosted, rn = ROW_NUMBER() OVER 
      (PARTITION BY COALESCE(IdMain, Id) ORDER BY DatePosted DESC, Id)
    FROM dbo.Message
)
SELECT t.Id, t.IdMessage, m.IdMain, m.IdReply, m.DatePosted
  FROM lastMessage AS m
  INNER JOIN dbo.Target AS t
  ON m.Id = t.IdMessage
  WHERE m.rn = 1;

SQLfiddle demo

MySQL/PostgreSQL – How to Select One Most Recent Post for Each Author

If you aim is to have queries with maximum efficiency, none of the above queries is really the best. Not always at least.

Efficiency depends on many different things, like the specific DBMS, the specific version (different versions have different improvements on the optimizer and the available syntax), the type of columns, the indexes available, the size of the tables and distribution of values, the hardware the server is running, the configuration settings etc.

You should always test various different ways of writing the queries, on your tables, with the sizes and distribution you expect to have on production, with your hardware and configuration settings, to decide which rewritings of the queries should be kept.

This specific kind of query is often called greatest-n-per-group (there is even a tag for it!) and under certain assumptions, one of the many ways to write them, is often quite efficient in both MySQL and PostgreSQL. It uses a LATERAL join in Postgres, which is available in 9.3+ versions (in SQL Server lingo CROSS/OUTER APPLY) and a simulation of this join in MySQL.

The assumptions are that the number of authors (the attribute we group by on) is small, compared to the number of posts (the table where we apply the group by). It's also best if there is an index or a table to find all the distinct author_id values and an additional index on the posts table for the group by.

This solution to the greatest-n-per-group problem matches also your request about ties, as it returns always one result per group. If you want to be precise about which one (of the tied) will be returned, the ORDER BY in the subquery can be modified (to ORDER BY pi.date DESC, pi.id DESC or ORDER BY pi.date DESC, a.name for example).

Query in PostgreSQL:

SELECT p.* 
FROM authors AS a
   , LATERAL 
       ( SELECT pi.*
         FROM posts AS pi
         WHERE pi.author_id = a.author_id
         ORDER BY pi.date DESC
         LIMIT 1
       ) AS p ;

Query in MySQL:

SELECT p.* 
FROM authors AS a
  JOIN posts AS p
    ON p.id =
       ( SELECT pi.id
         FROM posts AS pi
         WHERE pi.author_id = a.author_id
         ORDER BY pi.date DESC
         LIMIT 1
       ) ;

The useful index is on posts (author_id, date, id) for MySQL and or on posts (author_id, date DESC) for Postgres.

Needless to say again but before using any of the above, they should be tested in your environment and cross tested against all the many other versions/rewritings of the query. In Postgres for example, the DISTINCT ON syntax can be used in version older than 9.3. The resulting query is more compact than the LATERAL and might be more efficient, under different data distributions. Query:

SELECT DISTINCT ON (author_id) p.*
FROM posts AS p
ORDER BY p.author_id,
         p.date DESC ;

Best Answer

Related Solutions

SQL Server – How to Select All Most Recent Records

MySQL/PostgreSQL – How to Select One Most Recent Post for Each Author

Related Question