Best Data Type for Storing an Array of Flags (Bitmap/Bit Array)

datatypessql serversql-server-2005sql-server-2008

I need to store a bit array for each record of a table, supporting the following operations:

  • Testing if a bit is set, and setting a bit (using SQL)

  • Querying and setting the value using ADO 2.8 (not ADO.NET)

  • Indexing (in order to benefit from the "covering index" feature)

The maximum number of bits to be stored in this array is fixed, but may exceed 32. That is, a simple int column doesn't always work.

From what I've seen so far, my options are:

  1. Use several int columns
  2. Use bigint (works as long as the number of bits is <= 64)
  3. Use binary
  4. ?

The first option would work, but require quite a bit of refactoring in the code that accesses the data. The second option is a temporary relief only, and from my searches so far I'm not too sure if ADO works that well with bigint. I have no experience with binary, and I'm not aware of any other options.

Which data type would you choose, given the requirements?

Best Answer

I can't advocate strongly enough not to use a single field for this.

I'm currently dealing with maintaining a very large dataset with a bigint bitmask field and it's a bit of a performance nightmare.

If you check a single bit it's fine. If you check more than one bit performance degrades very quickly.

Due to the nature of bitmask integers, the data distribution will be very imbalanced and you'll get suboptimal plans.

Multiple bit checks result in range or index scans with a function running against every row. It's a mess.

My workaround was simple - I made a table to store the PK for each of the conditions to be checked. This is counter-intuitive initially but the space needed is low (you only store the PK) and lookups are lightning fast, especially if you use a UNIQUE CLUSTERED INDEX.

You can add as many conditions as you want without affecting your main table, and updates also don't affect your main table.

Indexing is simple since you just index all the lookup tables individually, and since your clustered key is the same on your main table and the lookups all your evaluations are merge joins which are very efficient.