Best Data Type for Storing an Array of Flags (Bitmap/Bit Array)

datatypessql serversql-server-2005sql-server-2008

I need to store a bit array for each record of a table, supporting the following operations:

Testing if a bit is set, and setting a bit (using SQL)
Querying and setting the value using ADO 2.8 (not ADO.NET)
Indexing (in order to benefit from the "covering index" feature)

The maximum number of bits to be stored in this array is fixed, but may exceed 32. That is, a simple int column doesn't always work.

From what I've seen so far, my options are:

Use several int columns
Use bigint (works as long as the number of bits is <= 64)
Use binary
?

The first option would work, but require quite a bit of refactoring in the code that accesses the data. The second option is a temporary relief only, and from my searches so far I'm not too sure if ADO works that well with bigint. I have no experience with binary, and I'm not aware of any other options.

Which data type would you choose, given the requirements?

Best Answer

I can't advocate strongly enough not to use a single field for this.

I'm currently dealing with maintaining a very large dataset with a bigint bitmask field and it's a bit of a performance nightmare.

If you check a single bit it's fine. If you check more than one bit performance degrades very quickly.

Due to the nature of bitmask integers, the data distribution will be very imbalanced and you'll get suboptimal plans.

Multiple bit checks result in range or index scans with a function running against every row. It's a mess.

My workaround was simple - I made a table to store the PK for each of the conditions to be checked. This is counter-intuitive initially but the space needed is low (you only store the PK) and lookups are lightning fast, especially if you use a UNIQUE CLUSTERED INDEX.

You can add as many conditions as you want without affecting your main table, and updates also don't affect your main table.

Indexing is simple since you just index all the lookup tables individually, and since your clustered key is the same on your main table and the lookups all your evaluations are merge joins which are very efficient.

Related Solutions

MySQL data type for 128 bit integers

I don't know what the best way necessarily is to store it -- but there's at least a better option than using a varchar(39) (or varchar(40) if you needed it signed) ; instead use a decimal(39,0). From the mysql docs:

Fixed-Point (Exact-Value) Types

The DECIMAL and NUMERIC types store exact numeric data values. These types are used when it is important to preserve exact precision, for example with monetary data. In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.

MySQL 5.1 stores DECIMAL values in binary format. Before MySQL 5.0.3, they were stored as strings. See Section 11.18, “Precision Math”.

In a DECIMAL column declaration, the precision and scale can be (and usually is) specified; for example:
salary DECIMAL(5,2)
In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point.

Standard SQL requires that DECIMAL(5,2) be able to store any value with five digits and two decimals, so values that can be stored in the salary column range from -999.99 to 999.99.

In standard SQL, the syntax DECIMAL(M) is equivalent to DECIMAL(M,0). Similarly, the syntax DECIMAL is equivalent to DECIMAL(M,0), where the implementation is permitted to decide the value of M. MySQL supports both of these variant forms of DECIMAL syntax. The default value of M is 10.

If the scale is 0, DECIMAL values contain no decimal point or fractional part.

The maximum number of digits for DECIMAL is 65, but the actual range for a given DECIMAL column can be constrained by the precision or scale for a given column. When such a column is assigned a value with more digits following the decimal point than are permitted by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the permissible number of digits.)

It's stored packed, so it'll take up less space than the varchar (18 bytes, if I'm doing my math right), and I'd hope you'd be able to do math on it directly, but I've never tried with that large of a number to see what happens.

Sql-server – Rounding of SMALLMONEY Data Type in SQL Server 2005

I am fairly certain that it just looks at the third decimal digit... and that's 4 so it rounds down to 1.28.

Clarify - it looks to the digit one greater than what you are rounding to... so it looks to the third because you are rounding to the second.

Best Answer

Related Solutions

MySQL data type for 128 bit integers

Sql-server – Rounding of SMALLMONEY Data Type in SQL Server 2005

Related Question