Postgresql – How to write “and” queries for a lookup table when you can’t use the AND keyword

join;postgresqlqueryrelational-division

I'm building a practice website that allows users to search hotels based off of the amenities that the hotel provides. From the FE, the user will select a checkbox for however many amenities they want, and the amenity keys will be sent to the backend. On the backend, I've got these three tables:

          hotels
| id | name         | vacancies
| ---| ------------ | -------- |
|   1| Marriott     |        0 |
|   2| Best Western |       10 |
|   3| Sheraton     |        3 |
 ------------------------------

          amenities
| id | name                  | key                   |
| ---| --------------------- | --------------------- |
|   1| Cafe                  | cafe                  |
|   2| Wheelchair Accessible | wheelchair_accessible |
|   3| Wifi                  | wifi                  |
 ----------------------------------------------------


    hotels_amenities_lookup
| id | amenity_id | hotel_id |
| ---| ---------- | -------- |
|   1|          1 |        3 |
|   2|          2 |        1 |
|   3|          2 |        2 |
|   4|          2 |        3 |
|   5|          3 |        2 |
|   6|          3 |        1 |
 ----------------------------

To search for one amenity, such as wheelchair_accessible, I would do something like this:

WITH hotels_with_amenity as (
    SELECT ha.hotel_id
      FROM hotels_amenities_lookup ha
      JOIN (
            SELECT id from amenities a 
            WHERE  a.key = 'wheelchair_accessible'
        ) amenity ON ha.amenity_id = amenity.id
)

SELECT  h.name, 
        h.vacancies 
   FROM hotels h, hotels_with_amenity hwa
  WHERE h.id = hwa.hotel_id;

Returns all three hotels.

The question is: if the user selects multiple amenities, wheelchair_accessible and wifi for example, how would I query for hotels that have both? With this current set up, I couldn't do
WHERE a.key = 'wheelchair_accessible AND a.key = 'wifi'

Is there a better way of setting up these tables to make this query easier?

I'm new to relational databases and it's likely I'm missing something obvious here.

Best Answer

A classical case of !

One of the most efficient queries techniques for what you ask:

SELECT h.name, 
       h.vacancies 
FROM   hotels_amenities_lookup ha1
JOIN   hotels_amenities_lookup ha2 USING (hotel_id)
-- more?
JOIN   hotels h ON h.id = ha1.hotel_id
WHERE  ha1.amenity_id = (SELECT id FROM amenities WHERE key = 'wheelchair_accessible')
AND    ha2.amenity_id = (SELECT id FROM amenities WHERE key = 'wifi')
-- more?

db<>fiddle here (reusing McNets fiddle, thanks!)

These indices help:

  • amenity(key, id) -- ①
  • hotel_amenity(amenity_id, hotel_id)
  • hotel_amenity(hotel_id, amenity_id) -- ②
  • hotel(id) -- PK

① Maybe in the form of UNIQUE (key) INCLUDE (id). Related:

② Yes, another multi-column index with reversed columns. See:

But there is a plethora of other ways:

Aside 1: The above is using AND, but it's not really about avoiding the keyword, is it?
Aside 2: Just hotels_amenities instead of hotels_amenities_lookup?

Database design

But, as Laurenz commented, the first question is about the right database design. And that heavily depends on exact (undisclosed) specifications and requirements.

Your fully normalized many-to-many implementation is a "catch-all" solution. You can combine as many hotels and as many amenities as you like, rename, add, drop hotels and amenities easily, add any amount of additional information in each table, and indexing with plain btree indexes is simple. Robust against heavy concurrent write load with many updates in any or all tables. But the disk footprint is high. And optimizing your query with many amenities needs some work.

For combining just a hand full of amenities the presented query is fast and optimized by the query planner automatically (using MCV lists in the column statistics). For more than a few amenities (more than join_collapse_limit) you'll have to switch to a smarter solution like outlined in the related answer you found:

The linchpin of the solution is to filter on rare amenities first.
About join_collapse_limit:

OTOH, for a small, stable set of amenities a simple design with boolean flags like Laurenz suggested has a much smaller disk footprint. Maybe add one partial index per flag.
Or an array of amenities for more than a few options (optimized with "char"[]?) with a GIN index. May even be combined with your current design in the form of a MATERIALIZED VIEW.
Each with a matching query style.

If storage space is important (it always is in some way):

As always: it depends.