Postgresql – Join tables and summarize rows based on condition

countjoin;postgresqlsum

I have two tables.

  • One containing countries with an ID and the geometry

  • The other containing customers with a name, geometry and a boolean value showing if they are active or not.

    Countries
    iD     name     geometry
    1634   UK       xxx
    2357   USA      xxx
    3345   Mexico   xxx
    4694   Italy    xxx
    
    
    Customers
    name         geometry    statusActive
    Hernandez    xxx         TRUE
    Taylor       xxx         FALSE
    Monte        xxx         TRUE
    Winter       xxx         TRUE
    Best         xxx         TRUE
    Twist        xxx         TRUE
    

I want to find out, how many active customers there are in each country.

Step 1 – Join data: I could join the two tables based on the geometry. With ST_Contains I can test, which of the customers is located in which country. This works out well. I get as result a table from Customers which has additionally the column of the country ID. In addition, only customers with an active status are returned.

This is my SQL statement:

SELECT customers.*, countries.id
FROM customers
JOIN countries ON ST_Contains(countries.geometry, customers.geometry)
WHERE customers.statusActive IS TRUE

This is the resulting table:

Customers
name        geometry    statusActive    id
Hernandez   xxx         TRUE            1634
Monte       xxx         TRUE            4694
Winter      xxx         TRUE            2357
Best        xxx         TRUE            2357
Twist       xxx         TRUE            4694

Step 2- Sum up the customers in each country.
I do not know how I can achieve this. I already tried to use GROUP BY based on the country ID, but this results in an error, that I have to include all columns from the SELECT statement. When I remove the customers.* (all fields from customers) from the SELECT, the query is running forever.
This is how the desired result should look like:

NumberActiveUsers   ID
1                   1634
2                   2357
0                   3345    
2                   4694

Do I need another function to sum up the entries, or is GROUP BY the correct function?

Best Answer

I had to edit some parts to get a query, as i didn#t have geommetry

CREATE TABLE countries (
  "id" INTEGER,
  "name" VARCHAR(6),
  "geometry" VARCHAR(3)
);

INSERT INTO countries
  ("id", "name", "geometry")
VALUES
  ('1634', 'UK', 'xxx'),
  ('2357', 'USA', 'yyy'),
  ('3345', 'Mexico', 'zzz'),
  ('4694', 'Italy', 'aaa');
CREATE TABLE customers (
  "name" VARCHAR(9),
  "geometry" VARCHAR(3),
  "statusActive" boolean
);

INSERT INTO customers
  ("name", "geometry", "statusActive")
VALUES
  ('Hernandez', 'xxx', 'TRUE'),
  ('Taylor', 'xxx', 'FALSE'),
  ('Monte', 'aaa', 'TRUE'),
  ('Winter', 'yyy', 'TRUE'),
  ('Best', 'yyy', 'TRUE'),
  ('Twist', 'aaa', 'TRUE');
SELECT  COALESCE(countr,0) NumberActiveUsers ,co.id
FROM
(SELECT countries.id, Count(*) countr
FROM customers
JOIN countries ON countries.geometry =  customers.geometry
WHERE "customers"."statusActive" IS TRUE
GROUP BY countries.id) t1 RIGHT JOIN countries co ON t1.id = co.id
numberactiveusers |   id
----------------: | ---:
                1 | 1634
                2 | 2357
                0 | 3345
                2 | 4694

db<>fiddle here