Postgresql – receive the error message “’‘ must appear in the GROUP BY clause or be used in an aggregate function”

errorsgroup bypostgresql

Where am I going wrong? Why am I getting that message when I add the MAX function?

The query involved:

WITH
   references_cve AS (
      SELECT vulnerability_id, reference
      FROM dim_vulnerability_reference
      WHERE dim_vulnerability_reference.source = 'NVD'
   )
SELECT DISTINCT ON (
      da.ip_address,
      favi.port,
      dv.severity,dp.name, dv.title,
      dv.description,
      fix,
      dv.nexpose_id,
      cve.reference,
      da.host_name,
      dos.description,
      da.last_assessed_for_vulnerabilities,
      davbs.solution_id
   )
   da.ip_address AS IPAddress,
   CASE WHEN favi.port = -1 THEN NULL ELSE favi.port END AS Port,
   dv.severity AS Severity,
   dp.name AS Protocol,
   dv.title AS VulnerabilityName,
   htmlToText(dv.description) AS Summary,
   htmlToText(dv.description) AS Description,
   htmlToText(fix) as Solution,
   'https://www.rapid7.com/db/vulnerabilities/' || dv.nexpose_id as SeeAlso,
   dv.nexpose_id AS NexposeID,
   ''||null as PluginOutput,
   cve.reference as CVE,
   ''||null as APPorOS,
   ''||null as BU,
   da.host_name as DEVICENAME,
   date(da.last_assessed_for_vulnerabilities) as LastScanDate,
   dos.description as DeviceOperatingSystem,
   MAX(sr.solution_id),
   davbs.vulnerability_id,
   dv.vulnerability_id


FROM fact_asset_vulnerability_instance favi
   LEFT JOIN dim_asset da USING (asset_id)
   LEFT JOIN dim_vulnerability dv USING (vulnerability_id)
   LEFT JOIN dim_site_asset dsa USING (asset_id)
   LEFT JOIN dim_site ds USING (site_id)
   LEFT JOIN dim_vulnerability_status dvs USING (status_id)
   LEFT JOIN dim_protocol dp USING (protocol_id)
   LEFT JOIN dim_service dsvc USING (service_id)
   LEFT JOIN dim_operating_system dos USING (operating_system_id)
   LEFT JOIN dim_asset_vulnerability_best_solution davbs ON da.asset_id = davbs.asset_id AND dv.vulnerability_id = davbs.vulnerability_id
   LEFT JOIN dim_solution sr on davbs.solution_id = sr.solution_id
   LEFT JOIN references_cve cve on favi.vulnerability_id = cve.vulnerability_id

ORDER BY da.ip_address

The resulting message:

Error: column "da.ip_address" must appear in the GROUP BY clause or be used in an aggregate function
Character:456

Best Answer

I assume you are using PostgreSQL.

I suppose -I can't reproduce your whole query-, you are getting this error because you added an aggregate to it:

MAX(sr.solution_id)

I've set up a minimal example to reproduce this error:

CREATE TABLE t(id INT, foo int, bar int);
INSERT INTO t VALUES
(1, 10, 20),
(1, 11, 21),
(1, 12, 22),
(2, 30, 40),
(2, 31, 41),
(2, 32, 42);

Using a similar syntax:

SELECT
    DISTINCT ON (id) id,
    foo,
    MAX(bar) bar
FROM
    t
ORDER BY 
    id, foo;

It returns the next error:

ERROR: column "t.id" must appear in the GROUP BY clause or be used in an aggregate function LINE 2: DISTINCT ON (id) id,

One idea to solve it is by using a sub-query to calculate the aggregated value (but this doesn't work as one would expect):

SELECT
    DISTINCT ON (id) id,
    foo,
    (SELECT 
        MAX(bar)
     FROM
       t t1
     WHERE
       t1.id = t.id) max_bar
FROM
    t
ORDER BY 
    id, max_bar;
id | foo | max_bar
-: | --: | ------:
 1 |  10 |      22
 2 |  30 |      42

Assuming that you want one row per id, th eone that has the maximum bar, then this is how to use DISTINCT ON:

SELECT
    DISTINCT ON (id) id,
    foo,
    bar
FROM
    t
ORDER BY 
    id, bar DESC;
id | foo | max_bar
-: | --: | ------:
 1 |  12 |      22
 2 |  32 |      42

db<>fiddle here


Using DISTINCT ON and after the comments that clarified a bit the requirements, that you want a row for each vulnerability - identified by dv.nexpose_id:

The query has a row per vulnerability, however, some vulnerabilities have more than 1 solution which causes that row to be duplicated but with a different solution ID. I just need 1 solution ID per row.

Nexpose ID is what identifies the vulnerability.

your query should be rewritten:

WITH
   references_cve AS (
      SELECT vulnerability_id, reference
      FROM dim_vulnerability_reference
      WHERE dim_vulnerability_reference.source = 'NVD'
   )
SELECT DISTINCT ON (
      dv.nexpose_id    -- one result for each vulnearibility
   )
   da.ip_address AS IPAddress,
      -- ...
   dos.description as DeviceOperatingSystem,

   sr.solution_id,               --<-- This is changed

   davbs.vulnerability_id,
   dv.vulnerability_id


FROM fact_asset_vulnerability_instance favi
   LEFT JOIN dim_asset da USING (asset_id)

      -- ...

ORDER BY 
      dv.nexpose_id,        -- same as the DISTNCT ON   
      sr.solution_id DESC   -- plus the ordering we want
  ;