Postgresql – How to write “and” queries for a lookup table when you can’t use the AND keyword

join;postgresqlqueryrelational-division

I'm building a practice website that allows users to search hotels based off of the amenities that the hotel provides. From the FE, the user will select a checkbox for however many amenities they want, and the amenity keys will be sent to the backend. On the backend, I've got these three tables:

          hotels
| id | name         | vacancies
| ---| ------------ | -------- |
|   1| Marriott     |        0 |
|   2| Best Western |       10 |
|   3| Sheraton     |        3 |
 ------------------------------

          amenities
| id | name                  | key                   |
| ---| --------------------- | --------------------- |
|   1| Cafe                  | cafe                  |
|   2| Wheelchair Accessible | wheelchair_accessible |
|   3| Wifi                  | wifi                  |
 ----------------------------------------------------


    hotels_amenities_lookup
| id | amenity_id | hotel_id |
| ---| ---------- | -------- |
|   1|          1 |        3 |
|   2|          2 |        1 |
|   3|          2 |        2 |
|   4|          2 |        3 |
|   5|          3 |        2 |
|   6|          3 |        1 |
 ----------------------------

To search for one amenity, such as wheelchair_accessible, I would do something like this:

WITH hotels_with_amenity as (
    SELECT ha.hotel_id
      FROM hotels_amenities_lookup ha
      JOIN (
            SELECT id from amenities a 
            WHERE  a.key = 'wheelchair_accessible'
        ) amenity ON ha.amenity_id = amenity.id
)

SELECT  h.name, 
        h.vacancies 
   FROM hotels h, hotels_with_amenity hwa
  WHERE h.id = hwa.hotel_id;

Returns all three hotels.

The question is: if the user selects multiple amenities, wheelchair_accessible and wifi for example, how would I query for hotels that have both? With this current set up, I couldn't do
WHERE a.key = 'wheelchair_accessible AND a.key = 'wifi'

Is there a better way of setting up these tables to make this query easier?

I'm new to relational databases and it's likely I'm missing something obvious here.

Best Answer

A classical case of relational-division!

One of the most efficient queries techniques for what you ask:

SELECT h.name, 
       h.vacancies 
FROM   hotels_amenities_lookup ha1
JOIN   hotels_amenities_lookup ha2 USING (hotel_id)
-- more?
JOIN   hotels h ON h.id = ha1.hotel_id
WHERE  ha1.amenity_id = (SELECT id FROM amenities WHERE key = 'wheelchair_accessible')
AND    ha2.amenity_id = (SELECT id FROM amenities WHERE key = 'wifi')
-- more?

db<>fiddle here (reusing McNets fiddle, thanks!)

These indices help:

amenity(key, id) -- ①
hotel_amenity(amenity_id, hotel_id)
hotel_amenity(hotel_id, amenity_id) -- ②
hotel(id) -- PK

① Maybe in the form of UNIQUE (key) INCLUDE (id). Related:

Do covering indexes in PostgreSQL help JOIN columns?

② Yes, another multi-column index with reversed columns. See:

Is a composite index also good for queries on the first field?

But there is a plethora of other ways:

How to filter SQL results in a has-many-through relation

Aside 1: The above is using AND, but it's not really about avoiding the keyword, is it?
Aside 2: Just hotels_amenities instead of hotels_amenities_lookup?

Database design

But, as Laurenz commented, the first question is about the right database design. And that heavily depends on exact (undisclosed) specifications and requirements.

Your fully normalized many-to-many implementation is a "catch-all" solution. You can combine as many hotels and as many amenities as you like, rename, add, drop hotels and amenities easily, add any amount of additional information in each table, and indexing with plain btree indexes is simple. Robust against heavy concurrent write load with many updates in any or all tables. But the disk footprint is high. And optimizing your query with many amenities needs some work.

For combining just a hand full of amenities the presented query is fast and optimized by the query planner automatically (using MCV lists in the column statistics). For more than a few amenities (more than join_collapse_limit) you'll have to switch to a smarter solution like outlined in the related answer you found:

Using same column multiple times in WHERE clause

The linchpin of the solution is to filter on rare amenities first.
About join_collapse_limit:

OTOH, for a small, stable set of amenities a simple design with boolean flags like Laurenz suggested has a much smaller disk footprint. Maybe add one partial index per flag.
Or an array of amenities for more than a few options (optimized with "char"[]?) with a GIN index. May even be combined with your current design in the form of a MATERIALIZED VIEW.
Each with a matching query style.

If storage space is important (it always is in some way):

Should I use the PostgreSQL bit string?

As always: it depends.

Related Solutions

Mysql – How to get non-deleted messages for one user_id if the same user_id appears in two tables at same time

The problem is query #4 needs a small adjustment. Now the query is:

SELECT p.*
    , rcp.*
    , msg.*
    , msg.id as message_id 
FROM default_messages msg 
    LEFT JOIN default_recipient rcp ON (msg.id = rcp.message_id) 
    LEFT JOIN default_profiles p ON (p.user_id = msg.sender_user_id) 
WHERE (msg.sender_user_id = 1 AND msg.deleted = 0) 
       OR (rcp.user_id = 1 AND rcp.deleted = 0) 
ORDER BY msg.date DESC;

Mysql – Slow performance of MySQL Join Query

Please provide SHOW CREATE TABLE; the explain is useless without it.

OR is a performance killer in many contexts.

( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR ( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ...

Turn that into

JOIN ( SELECT id FROM p WHERE 
( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR 
( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ... ) x ON x.id = foo.id

Also needed (on p):

INDEX(pricelist, iln, id)

(With the CREATEs, I could be more specific.)

The idea behind this "trick" is to move the costly work of the OR into a subquery that returns the necessary ids. Plus the INDEX makes it so that it can do all that work in the INDEX.

Best Answer

Database design

Related Solutions

Mysql – How to get non-deleted messages for one user_id if the same user_id appears in two tables at same time

Mysql – Slow performance of MySQL Join Query

Related Question