I'm building a practice website that allows users to search hotels based off of the amenities that the hotel provides. From the FE, the user will select a checkbox for however many amenities they want, and the amenity key
s will be sent to the backend. On the backend, I've got these three tables:
hotels
| id | name | vacancies
| ---| ------------ | -------- |
| 1| Marriott | 0 |
| 2| Best Western | 10 |
| 3| Sheraton | 3 |
------------------------------
amenities
| id | name | key |
| ---| --------------------- | --------------------- |
| 1| Cafe | cafe |
| 2| Wheelchair Accessible | wheelchair_accessible |
| 3| Wifi | wifi |
----------------------------------------------------
hotels_amenities_lookup
| id | amenity_id | hotel_id |
| ---| ---------- | -------- |
| 1| 1 | 3 |
| 2| 2 | 1 |
| 3| 2 | 2 |
| 4| 2 | 3 |
| 5| 3 | 2 |
| 6| 3 | 1 |
----------------------------
To search for one amenity, such as wheelchair_accessible
, I would do something like this:
WITH hotels_with_amenity as (
SELECT ha.hotel_id
FROM hotels_amenities_lookup ha
JOIN (
SELECT id from amenities a
WHERE a.key = 'wheelchair_accessible'
) amenity ON ha.amenity_id = amenity.id
)
SELECT h.name,
h.vacancies
FROM hotels h, hotels_with_amenity hwa
WHERE h.id = hwa.hotel_id;
Returns all three hotels.
The question is: if the user selects multiple amenities, wheelchair_accessible
and wifi
for example, how would I query for hotels that have both? With this current set up, I couldn't do
WHERE a.key = 'wheelchair_accessible AND a.key = 'wifi'
Is there a better way of setting up these tables to make this query easier?
I'm new to relational databases and it's likely I'm missing something obvious here.
Best Answer
A classical case of relational-division!
One of the most efficient queries techniques for what you ask:
db<>fiddle here (reusing McNets fiddle, thanks!)
These indices help:
amenity(key, id)
-- ①hotel_amenity(amenity_id, hotel_id)
hotel_amenity(hotel_id, amenity_id)
-- ②hotel(id)
-- PK① Maybe in the form of
UNIQUE (key) INCLUDE (id)
. Related:② Yes, another multi-column index with reversed columns. See:
But there is a plethora of other ways:
Aside 1: The above is using
AND
, but it's not really about avoiding the keyword, is it?Aside 2: Just
hotels_amenities
instead ofhotels_amenities_lookup
?Database design
But, as Laurenz commented, the first question is about the right database design. And that heavily depends on exact (undisclosed) specifications and requirements.
Your fully normalized many-to-many implementation is a "catch-all" solution. You can combine as many hotels and as many amenities as you like, rename, add, drop hotels and amenities easily, add any amount of additional information in each table, and indexing with plain btree indexes is simple. Robust against heavy concurrent write load with many updates in any or all tables. But the disk footprint is high. And optimizing your query with many amenities needs some work.
For combining just a hand full of amenities the presented query is fast and optimized by the query planner automatically (using MCV lists in the column statistics). For more than a few amenities (more than
join_collapse_limit
) you'll have to switch to a smarter solution like outlined in the related answer you found:The linchpin of the solution is to filter on rare amenities first.
About
join_collapse_limit
:OTOH, for a small, stable set of amenities a simple design with boolean flags like Laurenz suggested has a much smaller disk footprint. Maybe add one partial index per flag.
Or an array of amenities for more than a few options (optimized with
"char"[]
?) with a GIN index. May even be combined with your current design in the form of aMATERIALIZED VIEW
.Each with a matching query style.
If storage space is important (it always is in some way):
As always: it depends.