I like providing direct answers to questions; however, this topic can go deeper and longer, therefore I am adding a few articles at the bottom that expand on these details for all to learn from.
In summary ...
A few things that can affect using an index (mind you there are a lot of reasons too as you'll see in the articles posted below). The main priority of the engine is to predict how to get the data off the disk as fast as possible in the most effective/efficient way (use of statistics). What becomes the primary choice of the engine is to perform a table scan or index seek/scan. (There can be more depth to this conversation, but I will keep it to the context of your question).
- Use less fields in the SELECT that are more cohesive with the INDEX KEY and INDEX INCLUDES.
- More rows in the table will associate with the use of indexes
- The more the rows are selective (unique) the more likely the use of indexes, thus many NULL values will cause the engine to avoid the index
In detail ...
What are some causes a table scan or an index scan/seek is used. These are also general helpers to determine index creation and use for most general situations. Follow these few steps and you will achieve big benefits right from the get-go.
One reason, as it is mentioned in the comments above, is the use of SELECT * FROM WHERE . SELECT * is a sure-fire way for the engine to decide to avoid using indexes. It's faster to get all fields from the table (Clustered Index/Heap) by scanning/retrieving from the table itself, bypassing any indexes. First human choice is to minimize your select fields to those in the INDEX KEY and in the INDEX INCLUDES.
The second reason is to how many records are in the table. The fewer the records, the easier it is for the engine to simply scan the table. The statistics can have a part in this. Because the engine will use the statistics to predict where the data is on the pages/disk, it can be said that there is not enough rows/distribution to use the index.
And thirdly, having values in your table that are less selective (less unique), like Male and Female, the less likely the index will be used. A table that will use indexes more in queries will be for fields that are highly selective (more unique), like zip codes in an address table (so long as your list of address are NOT all of your neighbors with the same zip code).
There are many techniques and strategies. But one piece of advice, become knowledgeable in reading and experience with indexes/statistics and how/where to put them on DISKS/LUNS and you'll go far as a DBA and offering huge performance gains.
Querying a column that can contain nulls is more complex than querying a column that cannot. So also is querying multiple tables more complex than querying one table. I wouldn't let the avoidance of null drive the normalization.
For example, you mentioned that all devices will eventually be disposed and get a disposal date. If there are no other columns in the Disposal table, then in my mind it makes more sense to put DisposalDate in the Device table. Other tables like Sanitize might make more sense as separate tables because there are multiple data points that will not apply to some Devices.
Check constraints are great and should be used when possible, but there will always be times when a procedure is necessary.
Best Answer
In most DBs a
NOT NULL
column will be more efficient in terms of stored data for the reason you state, and also more efficient to query and index - so unless you want to allow NULLs in a column you should explicitly disallow them.There will be a slight performance implication, as the extra
NOT NULL
constraints will potentially need to be checked for each row you affect with any INSERT or UPDATE but as most databases are relative write-light and read-heavy this is probably not a concern (the little extra time taken is unlikely to be noticeable at all anyway as it is a CPU-bound operation where the rest of the insert/update operation will be IO-bound and so a much more significant bottle-neck) and it gives you some "free" data checking so your code (or other people's code) can not accidentally put NULLs where other code is not expecting them and so may give incorrect results in their presence.Edit: As Peter points out in his comment the above is a generalism and may not hold true for all DMBSs, though I'm pretty sure it does for mysql and mssql. Other complications in the area might include features like sparse tables (as implemented MSSQL 2008 for instance) which will alter the performance dynamics of (not)nullable columns.