Performance of bitwise operators on byte column vs individual bit columns

azure-sql-databaseindexoperatorperformance

I'm designing a database schema at the moment, but have a performance query.

Our table needs to have (at least) two flags, or boolean values. We can call these IsComplete and IsExportable. The obvious way to put these into a table is as two separate bit columns, but it occurred to me that we could put it into a byte column and use bitwise operators to query against it.

Which method would yield the best performance? Both flags will be queried often, and so filtering on them should be as fast as possible. Would an index on the byte column improve its performance?

One reason I like the byte column is because if another flag is required in the future, it can easily be made so. Whereas with separate bit columns, an additional column will need to be added.

For clarification, the database will be stored in SQL Azure. I suppose this might make some difference!

Best Answer

The answer to any performance question is "it depends." Discrete columns can be fast and byte column flags can be fast too. In absolute terms you can probably save a bitwise operation by having discrete columns here and there so discrete columns are theoretically faster.

A theoretical bump shouldn't be the main reason to choose a strategy. To paraphrase Jeff Atwood "storage is cheap and BDA's are expensive." I would avoid the flags compressed into a byte column for the simple reason that it makes your life more complicated by making your data more cryptic. Discrete columns will be straightforward to query, filter properly, debug, and pass on to future teammates.

For completeness there are other options than bit flags, or discrete columns. One other option is the EAV (entity-attribute-value) model, even though it doesn't feel like this is your best option based on your description.

Entity–attribute–value model (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest.

Related Solutions

Sql-server – Are there differences in performance of getting data between FullText Search and indexed columns

In large part it's going to depend on if @filter is a word or a group of words. Full text indexes essentially break down the contents of the column into the individual words and let you search for a word (group of words) or a synonym etc. 150k rows really isn't all that much to be searching on as these things go, but you may very well see some performance increase, again depending on what @filter is. I would also double check the rest of your indexes. For example one on art.id and one on lager.art_id and lager.skl_id. Also given how few columns you are returning (assuming that is the case in the real query not just this example) you might consider making them covering indexes by "including" (look at the key word INCLUDE in CREATE INDEX) the extra fields in your indexes. That lets the query just look at the index and not have to go back to the original table.

Sql-server – Bitmask Flags with Lookup Tables Clarification

I partially agree with Aaron's comment - in the most general case for storing 21 unrelated pieces of information, you'd probably use 21 bit columns. As a general solution, it may well be your best solution. If you had multiple bitmask-ed varchar columns, that would translate to a row with possibly over a hundred bit flags. FYI, 21 bits get stored as 3 bytes when you don't define them as NULLable, removing the necessity for space in the NULL bitmap. Since you have multiple bitmask columns, you'd end up with every 8 bits mashed into a byte.

What SQL Server ends up doing with your multi-column queries is eventually a bunch of bitmasking routines (yes! SQL Server uses bitmasks, so they the concept per se can't be all bad!) but for average use cases, it makes life easier for you.

If we had more information about what types of queries you run, we may be able to better advise, because ultimately the use cases dictate the design.

If you persist with the COMPUTED column, I would persist and index it if you haven't already. It helps some queries, such as

exact matches

WHERE computedInt = POWER(2, 6) -- bit position 7
AND matching on 15th bit and OR matching on 2 other bits (10th and 7th)

WHERE computedInt >= Power(2,14) AND computedInt < Power(2,15) AND computedInt & (Power(2,9) + Power(2,6)) > 0

But these are probably exotic samples and yet also real live in some cases. It's certainly not too much worse than 21 individual bit columns, for which yes your statements could be easier to write, but remember that SQL Server has mashed them for storage into 3 bytes and will be doing the bit-unmasking anyway! You would have thought if bit-masking were all bad (without exception) then SQL Server wouldn't be doing it, right?

EDIT

Re the scenario of

Four flags, HasHouse,HasCar,HasCat,HasDog, 0000 is has none, 1111 is has all.

it is more efficient and logically expedient to test all 4 bits at once and do a single integer based operation, e.g.

WHERE computedInt & (POWER(2,10)+POWER(2,5)+POWER(2,3)+POWER(2,1)) = 0 -- has none
WHERE computedInt & (POWER(2,10)+POWER(2,5)+POWER(2,3)+POWER(2,1)) > 0 -- has one or more

Hypothetically, if this were your most exercised query on the table, you might even group the four columns into another computed column and index it separately, making the bitmask unnecessary (just test the resultant int with =0 and >0). You might even go further and just precompute the answer... horses for courses.

Best Answer

Related Solutions

Sql-server – Are there differences in performance of getting data between FullText Search and indexed columns

Sql-server – Bitmask Flags with Lookup Tables Clarification

Related Question