Sql-server – use SPARSE somehow on a non-nullable bit column with mostly false values

performancequery-performancesparse-columnsql server

I have a table that stores the results of queries that are run at least once a day. There's a bit column that represents whether the row is from the most recent run of a particular query with a particular set of arguments. (Yes, it's a functional dependency, but a necessary denormalization for performance, since most queries on this table are only interested in the most recent result.)

Since this bit column is almost always a false value, I'm looking for the best way to tune queries returning only the true values. Partitioning isn't an option (Standard Edition). It seems like making the column SPARSE would be an interesting solution, but I believe that would require me to change the column to nullable and use NULL rather than 0 for false values. Seems a little kludgy.

Is there an option similar to SPARSE that would optimize space/performance for a non-null bit column with mostly (well over 99%) false values?

Pinal Dave's article indicates that both zero and null values are optimized, but this doesn't seem right to me, since these are different values — unless MSSQL is using the same mechanism for non-null columns to indicate the default value. This would be great if it were true, but the BOL doesn't mention this.

Best Answer

A filtered index (WHERE IsMostRecentRun = 1) sounds like a better idea to me than using sparse. If you can make it so that false is instead represented by null, you may be able to do both, but while that will potentially save some space in the base table, I suspect the bigger gain would be in query performance from the filtered index - as long as it's covering. If you need too many columns to cover and/or need to also filter or join on other columns, then you may find a balancing act between the improvements you get from the seek or range scan on the filtered index and the costs of things like lookups to get at the rest of the columns.

That all said, it seems suspicious you would need an additional column to serve as a flag for this. Isn't this information redundant? Can't it be determined by other data? In which case I would focus on index tuning to optimize the use of the existing columns instead of storing and maintaining redundant data.

(Also I don't think Pinal is correct.)