Adding a bool column; tracking untouched vs. false

best practicesdatabase-designdatatypes

[backstory] I have a simple database of our current Widget inventory. It averages just dozen columns across five or six tables, but has a huge number of records already.

Some Widgets came with caps, some did not, but we've actually never kept track before. Now management wants users to be able to keep track of whether each individual Widget arrives with a cap or not. We will not be doing an inventory to check our current stock, but when I add the new column for "Arrived Capped," there will be no data on everything received before today. What is the usual, best-practices approach to scenarios like this?

When adding a string column, the meaning of "" is obvious; no data has been entered. But I'm adding a bool, thus existing records will default to a value that does indicate something: FALSE.

My first thought was to make the user portal have, rather than a checkbox, a radio button pair for this question. Upon creating a new record, or even returning to old records with false-falses, if they click Yes or No, that value is recorded, plus another bool indicates this question was actually answered manually. That is; if the first bool is F, but the second bool is T, then the first bool is not a 'default false'.

Their hand-waving solution was to have records dated before today displayed as "unknown" (this was nixed as existing records could not then manually be given a user-verified value of false). Should I use a faux-bool int and think of it in trinary? Or, back to using a separate tracking bit/bool, should I make a new table for that sort of data? Is this a common concept? Any naming conventions?

Best Answer

In most DBMSs, or at least the ones I have dealt with, a Bool is really a trinary. You have 1, 0 and NULL. NULL is specifically there for "I haven't entered any data yet." Depending on how you code it your checkbox should default to unchecked for 0 or NULL. Then 1 for checked of course. However in your reporting you can actually differentiate between rows where someone has looked at it and entered a value (1 or 0) and rows where no one has updated it yet (rows where the value is NULL).