Postgresql – SQL – Find Results with MAXimum for given colum

postgresqlpostgresql-12query

I have a question on how to select best match in PostgreSQL for a given result/condition:

I have a setup containing 3 tables.
1 is a result overview table with reliability and status|
2 is a detailed segments overview with details per segment
3 is a general table with information on the targets.

Goal is to map segments onto targets based on results from table 1

Setup:

table1: Results_table

DROP TABLE IF EXISTS result_fcd;
CREATE TABLE result_fcd
(
    source_geometry_id integer NOT NULL,
    reliability double precision NOT NULL,
    status text COLLATE pg_catalog."default" NOT NULL,
    geometry geometry(LineString,4326) NOT NULL,
    CONSTRAINT result_fcd_pkey PRIMARY KEY (source_geometry_id)
);

table2: segments (links to originals)

DROP TABLE IF EXISTS  segments;
CREATE TABLE segments
(
    source_geometry_id integer NOT NULL,
    index integer NOT NULL,
    target_link_id integer NOT NULL,
    reversed_target_link boolean NOT NULL,
    start_fraction double precision NOT NULL,
    end_fraction double precision NOT NULL,
    geometry geometry(LineString,4326) NOT NULL,
    CONSTRAINT segments_pkey PRIMARY KEY (source_geometry_id, index)
);

table 3: Originals

DROP TABLE IF EXISTS originals;
CREATE TABLE originals
(
    gid integer NOT NULL DEFAULT nextval('originals_gid_seq'::regclass),
    linknr double precision,
    name character varying(50) COLLATE pg_catalog."default",
    direction double precision,
    length numeric,
    anode double precision,
    bnode double precision,
    namenr double precision,
    geom geometry(LineString,28992),
    CONSTRAINT originals_pkey PRIMARY KEY (gid)
)

Example of output (simple left join)

SELECT target_link_id,reversed_target_link,segs.source_geometry_id,reliability
FROM segments segs
LEFT JOIN result_fcd res ON res.source_geometry_id = segs.source_geometry_id
WHERE reliability > 80
AND status = 'match'
LIMIT 10

results:

target_link_id  reversed_target_link    source_geometry_id  reliability
750 true    6156655 "96.90882243480968"
750 true    6156654 "96.3385845004076"
759 false   299963  "97.88820671302537"
759 false   299962  "90.81686447596023"
759 false   4339973 "83.83639328022115"
981 true    304625  "94.25089913251661"
981 true    304624  "96.521137788089"
981 true    304626  "94.21941014522639"
981 true    304621  "91.93785338486143"
981 true    304622  "95.27831743865612"

now the goal is to merge the results into:

750 true    6156655 "96.90882243480968"
-- 750  true    6156654 "96.3385845004076"
759 false   299963  "97.88820671302537"
-- 759  false   299962  "90.81686447596023"
-- 759  false   4339973 "83.83639328022115"
-- 981  true    304625  "94.25089913251661"
981 true    304624  "96.521137788089"
-- 981  true    304626  "94.21941014522639"
-- 981  true    304621  "91.93785338486143"
-- 981  true    304622  "95.27831743865612"

I tried using a MAX(reliability) and grouping by target_link_id,reversed_target_link,source_geometry_id but that did not really get the results.

Essentially i need the best performing source_geometry_id (max reliability) for each target_link_id

Anyone able to assist with this?

Best Answer

I've used this data as start point:

create table t 
(
    target_link_id int,
    reversed_target_link boolean, 
    source_geometry_id  int, 
    reliability double precision
);

insert into t values
(750, true,    6156655, 96.90882243480968),
(750, true,    6156654, 96.3385845004076),
(759, false,   299963,  97.88820671302537),
(759, false,   299962,  90.81686447596023),
(759, false,   4339973, 83.83639328022115),
(981, true,    304625,  94.25089913251661),
(981, true,    304624,  96.521137788089),
(981, true,    304626,  94.21941014522639),
(981, true,    304621,  91.93785338486143),
(981, true,    304622,  95.27831743865612);

If you add a row_number() to your previous query you can get the desired result in this way:

with ct as
(
  select
      target_link_id,
      reversed_target_link,
      source_geometry_id,
      reliability,
      row_number() over (partition by target_link_id order by reliability desc) rn
  from
      t
)
select
    target_link_id,
    reversed_target_link,
    source_geometry_id,
    reliability,
from
    ct
where 
    rn = 1;
target_link_id | reversed_target_link | source_geometry_id | reliability      | rn
-------------: | :------------------- | -----------------: | :--------------- | -:
           750 | t                    |            6156655 | 96.9088224348097 |  1
           759 | f                    |             299963 | 97.8882067130254 |  1
           981 | t                    |             304624 | 96.521137788089  |  1

db<>fiddle here