Postgresql – Distinct array aggregation

aggregatearraypostgresqlpostgresql-9.6

This is my current query

SELECT
project_id, 
release_id, 
array_agg(DISTINCT version_bom_id) AS version_bom_ids,
sum(case when review_status='REVIEWED' then 1 else 0 end) AS reviewed_count,
sum(case when review_status='NOT_REVIEWED' then 1 else 0 end) AS not_reviewed_count,
sum(case when severity='BLOCKER' then 1 else 0 end) AS policy_blocker_count,
sum(case when severity='CRITICAL' then 1 else 0 end) AS policy_critical_count,
sum(case when severity='UNSPECIFIED' then 1 else 0 end) AS policy_unspecified_count,
array_agg(DISTINCT ignored) as ignored_array
--problem aggregating the policy_ids
--array_agg( DISTINCT policy_ids) FILTER (where policy_ids is not null) AS policy_ids,

FROM (

   SELECT vbc.review_status, 
   vbc.version_bom_id AS version_bom_id, 
   vbc.project_id AS project_id, 
   vbc.release_id AS release_id, 
   vbc.ignored as ignored,
   min(prrd.severity)  FILTER (where vbap.status = 'IN_VIOLATION') AS severity,
   array_agg(DISTINCT vbap.policy_id::UUID) FILTER (where vbap.status = 'IN_VIOLATION') AS policy_ids
   FROM
    version_bom_component vbc
    LEFT JOIN version_bom_entry vbe ON vbc.id = vbe.version_bom_component_id
    LEFT JOIN version_bom_active_policies vbap ON vbap.version_bom_entry_id = vbe.id
    LEFT JOIN policy_rule_rule_definition prrd ON prrd.id = vbap.policy_id
    GROUP BY version_bom_id, project_id, release_id, review_status, vbc.ignored
 ) t GROUP by project_id, release_id

The sub-query is giving me this result:

-[ RECORD 1 ]--+----------------------------------------------------------------------------
review_status  | NOT_REVIEWED
version_bom_id | 27
project_id     | 5468e2d4-e43b-4d83-b832-b654280584d0
release_id     | c29d7004-5678-49a1-8e4e-56ad3831a5a3
ignored        | f
severity       | BLOCKER
policy_ids     | {2074e6d1-3507-4fec-8a9e-f8b3c8c81caa,a9893b83-6444-400e-9ce0-f987156bbee2}
-[ RECORD 2 ]--+----------------------------------------------------------------------------
review_status  | NOT_REVIEWED
version_bom_id | 28
project_id     | 5468e2d4-e43b-4d83-b832-b654280584d0
release_id     | c29d7004-5678-49a1-8e4e-56ad3831a5a3
ignored        | f
severity       | CRITICAL
policy_ids     | {a9893b83-6444-400e-9ce0-f987156bbee2}
-[ RECORD 3 ]--+----------------------------------------------------------------------------
review_status  | NOT_REVIEWED
version_bom_id | 28
project_id     | 4859fdaf-5de6-42b7-9a6f-70354f24676c
release_id     | 1eb1ee65-49fc-44a9-b673-8b85b67eaf6d
ignored        | f
severity       | 
policy_ids     | 

I want my final output this way:

-[ RECORD 1 ]------------+----------------------------------------------------------------------------
project_id               | 5468e2d4-e43b-4d83-b832-b654280584d0
release_id               | c29d7004-5678-49a1-8e4e-56ad3831a5a3
version_bom_ids          | {27,28}
reviewed_count           | 0
not_reviewed_count       | 2
policy_blocker_count     | 1
policy_critical_count    | 1
policy_unspecified_count | 0
policy_ids               | {2074e6d1-3507-4fec-8a9e-f8b3c8c81caa,a9893b83-6444-400e-9ce0-f987156bbee2}
ignored_array            | {f}
-[ RECORD 2 ]------------+----------------------------------------------------------------------------
project_id               | 4859fdaf-5de6-42b7-9a6f-70354f24676c
release_id               | 1eb1ee65-49fc-44a9-b673-8b85b67eaf6d
version_bom_ids          | {28}
reviewed_count           | 0
not_reviewed_count       | 1
policy_blocker_count     | 0
policy_critical_count    | 0
policy_unspecified_count | 0
policy_ids               | 
ignored_array            | {f}

I'm having trouble aggregating the policy_ids from the sub-query.
Notice that the policy_ids in the final result should be distinct. Can you anyone please help me out!

Thanks in advance

Best Answer

You could create your own aggregate to append arrays:

CREATE AGGREGATE array_append_agg(anyarray) (
   SFUNC = pg_catalog.array_cat,
   STYPE = pg_catalog.anyarray
);

Then use that aggregate function in your outer query.