Mysql – How to design a table that each rows have 5K boolean attributes

MySQL

I have about 2M rows and each row looks like the following.

244 true false ... true

-> One integer column(V) and about 5K boolean columns(B1, B2, …, B5K) associated to the integer.

Due to the limitation of the maximum number of columns that I can have for a row, I have separated the boolean columns(attributes) in a separate table.

Table V:

idx_V value_V
--------------
1     244  

...

Table B:

idx_V idx_B value_B
--------------------
1     1     true
1     2     false
...
1     5K    true
...

This design works alright when I try to find V's that match one boolean column. For example, finding V's where the 2nd boolean attribute is true:

select value_V 
where VT.idx_A = BT.idx_A 
    and idx_B = 2 
    and value_B = true 
from V_Table as VT 
    and B_Table as BT

But the query becomes awful when I have to find V's that match a multiple boolean columns, sometimes even for all 5K columns, like finding V's with B1=true, B2=false, B3=true, … and B5K=false.

My primary use of the tables would be the following 2:

  1. Find V's that x1th, x2th and xnth boolean columns are false/true (n can be anything between 1 and 5K)
  2. Sublists:
    • Find the sequence of the boolean columns for a specific V: T F T T F F ...
    • Find other V's that match the sequence found in 2-A

I'm thinking about constructing a varchar[5K] field to store the boolean sequence to do 2 but it seems like there's just too much waste in space since each boolean only requires just 1 bit but I'm allocating a byte.

What would be the best way to go about this?

Best Answer

Combine the binary values into one or more integers and use binary arithmetic when computing the actual values. This will generate some decent cpu overhead, but the complexity of the data is massively reduced. i.e.

True True True True = 8+4+2+1 = 15
True False False True = 8+0+0+1 = 9
False True True False = 0+4+2+0 = 6