Oracle Query – Best Way to Get Latest Row Version Based on Time

greatest-n-per-grouporacleperformancequery-performance

I'm wondering what can be the most efficient query to get the latest rows from the following table.

CREATE TABLE AGE_DATA
(
  INSERTED_DAY TIMESTAMP(6),
  HOSTNAME VARCHAR2(255 CHAR),
  CONCEPT  VARCHAR2(255 CHAR),
  REPORTED_VALUE NUMBER(10)
);

If I want to know the latest data inserted by hostname, I have 2 approaches, and I want to know which would be the most efficient, depending the cases a) having ~100 rows, b) having more than ~1M rows.

with latest as (select hostname,max(inserted_day) inserted_day 
from AGE_DATA group by hostname)
select * 
from latest l
join age_data ad on (l.hostname = ad.hostname and l.inserted_day = ad.inserted_day)

or

select * from age_data 
where (hostname,inserted_day) in (select hostname,max(inserted_day) inserted_day 
    from AGE_DATA group by hostname)

Best Answer

There is a 3rd option, using window functions, either max():

with latest as 
(select *,
        max(inserted_day) over (partition by hostname)
          as max_inserted_day 
 from age_data
)
select *              
from latest l
where inserted_day = max_inserted_day ;

or rank():

with latest as 
(select *,
        rank() over (partition by hostname order by inserted_day desc)
          as rnk
 from age_data
)
select *              
from latest l
where rnk = 1 ;

If there are ties (rows with same maximum inserted_day for a hostname), all the above queries (and yours) will return all the tied rows. If you want to resolve ties (and return only 1 or only 2 rows), you can replace rank() with row_number() and change the order by clause inside the over according to the requirements.

Regarding efficiency, an index on (hostname, inserted_day) will help, in all the queries.