Select arbitrary single value for GROUP BY: What’s the fastest option

group byoracleoracle-18cquery-performance

I have a query that I use to indicate locations in a map where there are overlapping points:

select
    min(objectid) as min_objectid,
    longitudex,
    latitudey,
    count(1) as count,
    min(shape) as min_shape
from
    workorders
group by
    longitudex,
    latitudey
having
    count(1) > 1

In the mapping software that I use, I need to include columns like objectid and shape. For those columns, it doesn't matter which of the grouped-by rows the values come from, just as long as there is a value.

Presently, I'm using min() to get an arbitrary value for those columns. However, I don't know if that's the fastest option since finding the minimum value would require calculation — and I wonder if that time spent is unnecessary.

What is the fastest option for getting an arbitrary/single value for GROUP BY in an Oracle query?

Best Answer

Your query does return the duplicate locations, but does not return the individual points (work orders) with the same location.

This returns those locations again (same as your query, just reformatted in a more compact notation):

select longitudex, latitudey, count(*)
from workorders
group by longitudex, latitudey
having count(*) > 1;

Then this returns the individual work orders that share the same location:

select objectid, longitudex, latitudey, shape
from workorders
where (longitudex, latitudey) in (
  select longitudex, latitudey
  from workorders
  group by longitudex, latitudey
  having count(*) > 1
);

That obviously only works if the geographical coordinates are exactly the same for work orders at the same location, down to all decimals. If not then you need to use spatial operators to compare the locations. Those are available with databases like Oracle (out of the box) or PostgreSQL (with the PostGis extension).

Related Solutions

SQL Server Query Performance – Optimizing Group By with Many Columns

The non-clustered index you have tested is not the best for this query. It can be used for the WHERE clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY.

The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE clause), then have all the columns used in the GROUP BY and then INCLUDE all the other columns used in the SELECT:

CREATE INDEX special_ix 
  ON dbo.Commissions_Output
    ( company, location, account, 
      salesroute, employee, producttype, 
      item, loadjdate, commissionrate ) 
INCLUDE 
  ( [Extended Sales Price], [Delivered Qty] ) 
WHERE 
  ( [Extended Sales Price] <> 0 ) ;

Mysql – How to query group for increase or decrease in value

Join positions to itself. In pseudo-sql:

select
    o.position,
    n.position
from positions as o        -- old
inner join positions as n  -- new
    on o.application_id = n.application_id
    and o.country = n.country
    and o.feed_id = n.feed_id
    and o.created = <max value for this app, country and feed that's less than n.created>

There's a discussion on this SO question for other options.

Best Answer

Related Solutions

SQL Server Query Performance – Optimizing Group By with Many Columns

Mysql – How to query group for increase or decrease in value

Related Question