Postgresql – Query for cross matching multiple joined tables

postgresql

I have four tables, such that:

CREATE TEMP TABLE country (
  countryid serial PRIMARY KEY,
  name text
);
CREATE TEMP TABLE city (
  cityid serial PRIMARY KEY,
  name text,
  countryid int REFERENCES country
);
CREATE TEMP TABLE user_country (
  userid int,
  countryid int REFERENCES country
);
CREATE TEMP TABLE user_city (
  userid int,
  cityid int REFERENCES city
);

I'm trying to write a query which will match users up based on their user_country and user_city entries, so that any one hit will produce a match if,

  1. a user has any of the same user_country or user_city entries as another user.
  2. given the relationship between city and country; if one user has an entry in the user_country table and another user has an entry in the user_city table with a city that is in the first users_country, this will also produce a match.

For example, given the following example data and the user id 1, the query will return the user ids 2, 3, 4: 2 is matched in the user_country table (both users have an entry containing the country_id 1) and user ids 3 and 4 are matched because their entries in the user_city table relate to London which has a country_id which matches user id 1's user_country entry. (In laymans terms, user 1 wants to go anywhere in England, so does user 2, producing a match; users 3 and 4 are more specific but because both London and Leeds are in England they both produce matches)

Given the same data and the user id 3, the query will return the user id 1 and 2, as the country_id in the city table relates to the country that both users 1 and 2 have selected, but not user 4. (in laymans terms, user 3 wants to go to London; users 1 and 2 want to go anywhere in England, so they match. However user 4 only wants to go to Leeds, not London, so doesn't match)

INSERT INTO country (name) VALUES ('England'),('Canada');
INSERT INTO city (name,countryid) VALUES ('London',1),('Leeds',1),('Vancouver',2);

INSERT INTO user_country (userid, countryid) VALUES (1,1),(2,1);
INSERT INTO user_city (userid, cityid) VALUES (3,1),(4,2);

Can anyone help me with a single performant SQL query, that given a user id, returns a list of user ids which match the above scenarios?

Note: I'm using a recent version of postgres

EDIT
To clarify a few points: a user can have an entry in the user_city table without a corresponding entry in the user_country table for the country matching the city. This is because a user may have an interest in visiting a city, but not want to see the rest of the country. Being able to select a country is a shorthand way of selecting every single city in that country.
In layman's terms: just because a user wants to visit London, they don't necessarily want to visit all of the UK. So a user can have a user_city entry for London, but not have a user_country entry for the UK; also worth noting that if a user has a user_country entry for the UK, they won't have any user_city entries for any cities in that country (user wants to visit all of the UK, so doesn't make sense to list London too).

Separately a city will always be in a country, so will always have a country_id

EDIT 2
A bit more example data to replace the data above (not in addition to!)

INSERT INTO country (name) VALUES ('Germany'),('Spain'),('Italy');
INSERT INTO city (name,countryid) VALUES ('Berlin',1),('Madrid',2),('Rome',3),('Venice',3);

INSERT INTO user_country (userid, countryid) VALUES (1,1),(2,1),(5,3)
INSERT INTO user_city (userid, cityid) VALUES (3,2),(4,2),(6,3),(7,4)

Testing a match only on matching user_country table entries:

  • Entering user id 1 returns user id 2
  • Entering user id 2 returns user id 1

Testing a match only on matching user_city table entries:

  • Entering user id 3 returns user id 4
  • Entering user id 4 returns user id 3

Testing that a user_country matches all entries with related cities:

  • Entering user id 5 returns user ids 6 & 7

Testing that a user_city entry picks up a user_country entry:

  • Entering user id 6 returns user id 5
  • Entering user id 7 returns user id 5

Best Answer

Sample Data

This is actually very easy, first let's create your schema. Next time around, it's better and easier for everyone if you provide data like this.

CREATE TEMP TABLE country (
  countryid serial PRIMARY KEY,
  name text
);
CREATE TEMP TABLE city (
  cityid serial PRIMARY KEY,
  name text,
  countryid int REFERENCES country
);
CREATE TEMP TABLE user_country (
  userid int,
  countryid int REFERENCES country
);
CREATE TEMP TABLE user_city (
  userid int,
  cityid int REFERENCES city
);
INSERT INTO country (name) VALUES ('England'),('Canada');
INSERT INTO city (name,countryid) VALUES ('London',1),('Leeds',1),('Vancouver',1);
INSERT INTO user_country (userid, countryid) VALUES (1,1),(2,1),(4,2);
INSERT INTO user_city (userid, cityid) VALUES (3,1),(4,2),(2,3);

Solution

SELECT DISTINCT
  user_city_country.userid AS user1,
  user_country.userid AS user2
FROM user_country
JOIN country USING (countryid)
JOIN (
  SELECT *
  FROM user_city
  JOIN city USING (cityid)
  JOIN country USING (countryid)
) AS user_city_country
  USING (countryid);

This breaks down easy... Path one is to resolve the user to the country..

SELECT DISTINCT user_country.userid
FROM user_country
JOIN country USING (countryid)

Path 2 is to resolve the user_city to the country.

SELECT *
FROM user_city
JOIN city USING (cityid)
JOIN country USING (countryid)

Then we just JOIN them. Note, we're not actually using the country at all. You have it in your schema. So I preserved it here.

This returns...

 user1 | user2 
-------+-------
     4 |     2
     3 |     1
     2 |     1
     3 |     2
     4 |     1
     2 |     2

Now, if you want to know all matches, just add WHERE user_country.userid = '1' OR user_city_country.userid = '1';

 user1 | user2 
-------+-------
     4 |     1
     3 |     1
     2 |     1