Sql-server – Design concerns for using 3x bit or one char(1) or one integer in table for holding status of item

application-designdatabase-designsql server

I need to extend my existing table with status column.
I am in doubt how to do that in most appropriate way regarding of future use of that status and performances of existing table.

My table is around 3,7 MIL records, and it is heavy read and write table.
Table represent stock (warehouse) of items (goods). It holds floats for many kind of prices and amounts, primary key is composite by item_id and warehouse_id.

Here is how I see ways to implement adding status of an item.

To add three more columns for statuses, each one column will represent some of statues,
Something like 'allow_sell','allow_orders','allow_returns'
I like this way because it uses BITs, (is it faster than varchar, or integer ?) and easy to understand what each status represents and which combination of statuses are seted on an item. In this way

To add one integer column called status_id and to add one more table statuses_of_item where I can hold description of each status and add statues as many as I wish.

To add one varchar(10) column where status will be written as dresption ('denay_all','allow_all', 'allow_orders_wo_sels' etc)

Have in mind that status is part of business logic and it uses only internal, Representation of statues are not displayed in SIMPLE WAY, statues not shown to users, it is only way how system should handle some situation in different places.

I think that 70% of items inside table will have default (same) status.
My database system is MS Sql Server 2008R2.

How I should design my table to include stautes ?

Best Answer

In my experience, trying to encode multiple data points into a single column always ends up being more trouble than it's worth. Sure, it seems cool and clever to use BITWISE operators, but there are many things that go wrong and it won't always be efficient to test those bits without cumbersome and unintuitive workarounds. It's the same reason we stay away from storing comma-separated lists, JSON strings etc. in a single column - eventually you care about viewing or filtering on those distinct bits which you now have to extract, sometimes expensively.

With the information I have, my vote is for three separate BIT columns. They will still collapse to similar storage patterns as a single column with the three bits on/off, and can be made more efficient individually and across the board in several ways, including:

  • data compression
  • sparse columns
  • filtered indexes (e.g. WHERE allow_returns = 1)

Someone else advocated for three CHAR(1) columns. These do not benefit from storage collapse and also require a check constraint, making them less than ideal in my mind.

Now, my answer might change if you say, "well, what if I might add 15 other attributes in the future?" I certainly don't think it's wise to build the columns this way if they're not relatively static - changing the schema (and therefore all of the code and interfaces to it) for every new or changed attribute is going to be a royal pain. So in that case you might want to consider EAV - where the attributes are not part of the metadata but part of the data. There are a lot of objections to EAV, mostly around performance and the difficulty in enforcing constraints (in this case unlikely to be an issue if all of these attributes are either on or off), but it worked quite well for us at my previous job. You might model it like this:

CREATE TABLE dbo.Attributes
(
  AttributeID TINYINT PRIMARY KEY,
  Name VARCHAR(32) NOT NULL UNIQUE
);

CREATE TABLE dbo.ItemAttributes
(
  ItemID INT NOT NULL 
    FOREIGN KEY REFERENCES dbo.Items(ItemID),
  AttributeID TINYINT NOT NULL 
    FOREIGN KEY REFERENCES dbo.Attributes(AttributeID),
  Status BIT NOT NULL,
  PRIMARY KEY(ItemID, AttributeID)
);

And again, you can have filtered indexes here to make certain queries much more efficient, such as (imagine the AttributeID for "allow returns" is 10):

CREATE INDEX optAllowReturns ON dbo.ItemAttributes(ItemID)
  WHERE AttributeID = 10 AND Status = 1;

If you have certain attributes that are not on/off (for example, three states of manufacture or shipping), you can change the Status column to:

Value TINYINT NOT NULL

This can double as an on/off value for attributes that are represented that way, and as tri- or more-state value for attributes that require more than simple on/off. You can also reflect which type is which in the metadata of the dbo.Attributes table.