PostgreSQL – Creating an Index for Querying Variable Combinations of Booleans

indexpostgresqlquery

I have a table of people, each person has a color and they seek various other colors and require various skills.

So I can query this I've created a table like so:

person
id, color_is_red, color_is_green, color_is_blue, color_is_yellow,
seeks_red, seeks_green, seeks_blue, seeks_yellow, seeks_technical, seeks_creative, seeks_leadership

I can then query the table like so:

-- Searching for green & yellow people that looking for red people with skills creative or technical

select id
from person WHERE

    color_is_red = false
    AND color_is_blue = false
    AND seeks_red = true
    AND 
    (
        seeks_creative = true
        OR seeks_technical = true
    )
limit 200

I'm unclear if this is the best way to structure this data. I also don't know how I can index this. I had hoped to use a BLOOM index but bloom doesn't support booleans. I think maybe I need a GIN index as I'll be querying different combinations of the colors depending on whats required however I've tried creating a gin_btree index but the query planner isn't using it. Another option would be a to use a bitset but I'm not sure how I would do this or index it.

Any help would be greatly appreciated!
Thanks very much

update:
I have created a single btree index for each column but the query planner doesn't seem to be choosing to use them. Here is the explain output:
https://explain.depesz.com/s/TZLG

update 2
I've tried to use intarray to index int[] but I've not had much look with basic queries taking over a minute to execute: https://explain.depesz.com/s/RyBn

Best Answer

I've found a solution by casting the bools to ints and indexing using a bloom index. The query now executes in 27ms using a bitmap index scan compared to the previous 100ms of the sequential scan.

The index:

CREATE INDEX bloom_idx ON person USING bloom (
    CAST(color_is_red as int), CAST(color_is_green as int), CAST(color_is_blue as int), CAST(color_is_yellow as int), 
    CAST(seeks_red as int), CAST(seeks_green as int), CAST(seeks_blue as int), CAST(seeks_yellow as int)

);

The query:

explain (analyze, buffers, format text) 
SELECT id
FROM person 
WHERE CAST(color_is_red as int) = 0
  AND CAST(color_is_blue as int) = 0
  AND CAST(seeks_red as int) = 1
  AND 
    (
        CAST(seeks_creative as int) = 1
        OR CAST(seeks_technical as int) = 1
    )
limit 200;

The explain output:

https://explain.depesz.com/s/piUf

"Limit  (cost=20348.00..20352.04 rows=1 width=4) (actual time=27.660..27.824 rows=200 loops=1)"
"  Buffers: shared hit=1998"
"  ->  Bitmap Heap Scan on person  (cost=20348.00..20352.04 rows=1 width=4) (actual time=27.658..27.803 rows=200 loops=1)"
"        Recheck Cond: (((color_is_red)::integer = 0) AND ((color_is_blue)::integer = 0) AND ((seeks_red)::integer = 1))"
"        Rows Removed by Index Recheck: 58"
"        Filter: (((seeks_creative)::integer = 1) OR ((seeks_technical)::integer = 1))"
"        Rows Removed by Filter: 72"
"        Heap Blocks: exact=37"
"        Buffers: shared hit=1998"
"        ->  Bitmap Index Scan on bloom_idx_no_skills  (cost=0.00..20348.00 rows=1 width=0) (actual time=24.281..24.281 rows=156588 loops=1)"
"              Index Cond: (((color_is_red)::integer = 0) AND ((color_is_blue)::integer = 0) AND ((seeks_red)::integer = 1))"
"              Buffers: shared hit=1961"
"Planning Time: 0.458 ms"
"Execution Time: 27.905 ms"

(Querying a table of 1 million randomly generated rows)