SQL Server Flag vs Table Split – Database Design Considerations

database-designsql server

I am designing a table of items which will (potentially) contain tens of millions of records. Some items will not be available for use until they are "approved" by administrator. By "use" I mean that such items will not be referenced in any other table until they are "approved". Up to 50% of items may be "unapproved" at any given time. Records may become "approved", but not vice versa.

I consider two design options:

  • a bit flag
  • a separate table of "unapproved" items – when item is approved it is moved to "regular" table (renewal of item's ID is not an issue)

I think the second option is much better. Bit flag takes only a byte per row, so it is not an issue. But if we have a million of approved and a million of unapproved records in same table – scan time increases for operations with approved records.

Question is: should I consider first (bit flag) option instead? Does it have any benefits in described situation?

Best Answer

You can have it both ways with partitioned views.

You create an underlying table for each status, enforced by constraints, with mutually exclusive values. Then a view which UNIONs together the underlying tables. The view or each base table can be referenced explicitly. If a row's status is UPDATEd through the view the DBMS will DELETE it from one base table and insert it into the one corresponding to the new status. Each base table can be indexed independently according to its usage pattern. The optimiser will resolve index references to a single corresponding base table if it can.

The benefits are
a) shallower indexes. Do the math on the index fan-out, however. At that scale and split between your status values it is possible the indexes will be the same depth on the split tables as they would be on the combined table.
b) no application code has to change. The data continues to appear as a continuous whole.
c) future new status values can be included by adding a new base table, with constraint, and re-creating the view.

The cost is all that data movement; two pages and associated indexes are written for each status update. Lots of IO to deal with. That much movement will cause fragmentation, too.