I have a table of people, each person has a color and they seek various other colors and require various skills.
So I can query this I've created a table like so:
person
id, color_is_red, color_is_green, color_is_blue, color_is_yellow,
seeks_red, seeks_green, seeks_blue, seeks_yellow, seeks_technical, seeks_creative, seeks_leadership
I can then query the table like so:
-- Searching for green & yellow people that looking for red people with skills creative or technical
select id
from person WHERE
color_is_red = false
AND color_is_blue = false
AND seeks_red = true
AND
(
seeks_creative = true
OR seeks_technical = true
)
limit 200
I'm unclear if this is the best way to structure this data. I also don't know how I can index this. I had hoped to use a BLOOM index but bloom doesn't support booleans. I think maybe I need a GIN index as I'll be querying different combinations of the colors depending on whats required however I've tried creating a gin_btree index but the query planner isn't using it. Another option would be a to use a bitset but I'm not sure how I would do this or index it.
Any help would be greatly appreciated!
Thanks very much
update:
I have created a single btree index for each column but the query planner doesn't seem to be choosing to use them. Here is the explain output:
https://explain.depesz.com/s/TZLG
update 2
I've tried to use intarray
to index int[]
but I've not had much look with basic queries taking over a minute to execute: https://explain.depesz.com/s/RyBn
Best Answer
I've found a solution by casting the bools to ints and indexing using a
bloom
index. The query now executes in 27ms using a bitmap index scan compared to the previous 100ms of the sequential scan.The index:
The query:
The explain output:
https://explain.depesz.com/s/piUf
(Querying a table of 1 million randomly generated rows)