Postgresql – Error “column does not exist” in a SELECT with JOIN and GROUP BY query

group byjoin;postgresqlselect

I'm using PostgreSQL 9.1 with a Ruby on Rails application.

I'm trying to list the last version of each "charge" (in my history table : hist_version_charges) belonging to the same project id (proj_sous_projet_id = 2).

This makes me use the max() aggregate function and apply the result to a JOIN function on the same table as PostgreSQL does not authorize to use the columns in the SELECT clause if they do not appears in the GROUP BY clause, ALTHOUGH using a max() mean obviously I'm interested to the row containing the max values!

This is my query :

SELECT h_v_charges.*, 
       max(last_v.version) as lv 
FROM hist_versions_charges h_v_charges 
    JOIN hist_versions_charges last_v 
      ON h_v_charges.version = lv 
    AND h_v_charges.proj_charge_id = last_v.proj_charge_id 
GROUP BY last_v.proj_sous_projet_id, 
         last_v.proj_charge_id 
HAVING last_v.proj_sous_projet_id = 2 
ORDER BY h_v_charges.proj_charge_id ASC;

The error message I got :

ERROR:  column "lv" does not exist
LINE 1: ..._versions_charges last_v ON h_v_charges.version = lv AND h_v...
                                                             ^
********** Error **********

ERROR: column "lv" does not exist
SQL state: 42703
Character: 147

I also tried with "last_v.lv" but the error remains the same.

If anybody got an idea about what's wrong, she is more than welcome.

=== UPDATE ===

According to * a_horse_with_no_name * and Colin 't Hart answers, I finally ended up with the following query :

SELECT *
FROM (
    SELECT *, max(version) OVER (PARTITION BY proj_charge_id) AS lv
    FROM hist_versions_charges
    WHERE proj_sous_projet_id = 2) AS hv
WHERE hv.lv = hv.version
ORDER BY hv.proj_charge_id ASC;

It is slightly quicker with a single ORDER BY.

I tried as well the query with a WITH clause. Though "nicer", it creates additional processing charge. As I know I will not re-used in the future the sub-query twice or more in the same main query, I'm fine with using a simple sub-query.

Thanks anyway to *a_horse_with_no_name* and Colin 't Hart. I learned many things !

Best Answer

You probably want something like this:

SELECT h_v_charges.*, 
       last_v.last_version
FROM hist_versions_charges h_v_charges 
  JOIN (select proj_charge_id, 
               max(version) as last_version
        from hist_versions_charges 
        where proj_sous_projet_id = 2  
        group by proj_charge_id
  ) last_v  
  ON h_v_charges.version = last_v.last_version
 AND h_v_charges.proj_charge_id = last_v.proj_charge_id 
ORDER BY h_v_charges.proj_charge_id ASC;

A possibly (because no join is required) faster solution would be:

select *
from (
   select hvc.*, 
          row_number() over (partition by proj_charge_id order by version desc) as rn
   from hist_versions_charges as hvc
   where proj_sous_projet_id = 2  
) as hv
where rn = 1
order by hv.proj_charge_id ASC;

As Colin has pointed out, this can also be written as:

with hv as (
  select hvc.*, 
         row_number() over (partition by proj_charge_id order by version desc) as rn
  from hist_versions_charges as hvc
  where proj_sous_projet_id = 2  
) 
select *
from hv
where rn = 1
order by hv.proj_charge_id ASC;