Sql-server – use SPARSE somehow on a non-nullable bit column with mostly false values

performancequery-performancesparse-columnsql server

I have a table that stores the results of queries that are run at least once a day. There's a bit column that represents whether the row is from the most recent run of a particular query with a particular set of arguments. (Yes, it's a functional dependency, but a necessary denormalization for performance, since most queries on this table are only interested in the most recent result.)

Since this bit column is almost always a false value, I'm looking for the best way to tune queries returning only the true values. Partitioning isn't an option (Standard Edition). It seems like making the column SPARSE would be an interesting solution, but I believe that would require me to change the column to nullable and use NULL rather than 0 for false values. Seems a little kludgy.

Is there an option similar to SPARSE that would optimize space/performance for a non-null bit column with mostly (well over 99%) false values?

Pinal Dave's article indicates that both zero and null values are optimized, but this doesn't seem right to me, since these are different values — unless MSSQL is using the same mechanism for non-null columns to indicate the default value. This would be great if it were true, but the BOL doesn't mention this.

Best Answer

A filtered index (WHERE IsMostRecentRun = 1) sounds like a better idea to me than using sparse. If you can make it so that false is instead represented by null, you may be able to do both, but while that will potentially save some space in the base table, I suspect the bigger gain would be in query performance from the filtered index - as long as it's covering. If you need too many columns to cover and/or need to also filter or join on other columns, then you may find a balancing act between the improvements you get from the seek or range scan on the filtered index and the costs of things like lookups to get at the rest of the columns.

That all said, it seems suspicious you would need an additional column to serve as a flag for this. Isn't this information redundant? Can't it be determined by other data? In which case I would focus on index tuning to optimize the use of the existing columns instead of storing and maintaining redundant data.

(Also I don't think Pinal is correct.)

Related Solutions

Sql-server – Column definition for a SQL Server equivalent to Access BOOLEAN type

To address your concerns about BIT:

You can set your BIT column to NOT NULL.
You can use -1 when setting a BIT column to "true".
You can create a view that translates to -1, but +1 should be fine anyway unless your application explicitly checks for the numeric -1 (anything but zero should yield true in your client language).

CREATE TABLE dbo.foo(bar BIT NOT NULL, blat BIT NOT NULL);

INSERT dbo.foo SELECT -1, 0;

SELECT bar, blat, -CONVERT(SMALLINT, bar), -CONVERT(SMALLINT, blat) FROM dbo.foo;

Results:

bar   blat   
----  ----  ----  ----
1     0     -1    0

The nice thing about BIT over TINYINT/SMALLINT is that if you have between 1-8 BIT columns, they can fit into a single byte.

In all of these cases, you still aren't going to be able to say

WHERE NOT BooleanColumn
-- or
WHERE !BooleanColumn

You will still have to say

WHERE BooleanColumn = 0

Sql-server – Determining whether column in view can only contain unique values

Short answer: you'll have to do your own dirty work. Like @mrdenny says, there is no way of automating this task.

Very long answer: SQL Server doesn't have an easy way of determining unique columns in a view, in part because of how complex (and dynamic) views can be. There are two methods to make a first approximation, however, and these methods can work in concert. The first is what you already have: performing queries against the view to see if you can find anything which already violates uniqueness constraints. If it does, you can throw that column (or set of columns) out immediately. If it doesn't, that combination might be a valid unique key, but there is no guarantee.

The second method is to reason from the data model, starting with unique constraints (assuming you have those on your tables!). There are three nice ways that you can have enforced uniqueness on a table: a primary key, a unique key constraint, or a unique index. All three of them show up in the sys.indexes system table and have the is_unique property set to 1. There are also some not-so-nice ways like using triggers to enforce uniqueness.

You could try messing around with sys.dm_sql_referenced_entities and a query like this might be helpful to give you a starting point:

declare @ViewName sysname = 'MYVIEW';

with viewcolumns as
(
    select * from sys.dm_sql_referenced_entities(@ViewName, 'OBJECT') dsre /* where is_selected = 1   --uncomment if using SQL 2012 */
),
uniquereferences as
(
    select
        i.object_id,
        object_schema_name(i.object_id) as SchemaName,
        object_name(i.object_id) as TableName,
        i.name as IndexName,
        c.name as ColumnName,
        case when vc.referenced_entity_name IS NOT NULL then 1 else 0 end as HasReference
    from
        sys.indexes i
        inner join sys.index_columns ic 
            on i.index_id = ic.index_id
            and i.object_id = ic.object_id
        inner join sys.columns c
            on c.column_id = ic.column_id
            and c.object_id = ic.object_id
        left outer join viewcolumns vc
            on vc.referenced_id = c.object_id
            and vc.referenced_minor_id = c.column_id
    where
        i.is_unique = 1
),
sufficientreferences as
(
    select
        object_id,
        IndexName,
        min(HasReference) as HasReference
    from
        uniquereferences
    group by
        object_id,
        IndexName
    having
        min(HasReference) = 1
)
select
    ur.object_id,
    ur.SchemaName,
    ur.TableName,
    ur.IndexName,
    ur.ColumnName
from
    uniquereferences ur
    inner join sufficientreferences sr 
        on ur.IndexName = sr.IndexName
        and ur.object_id = sr.object_id
where
    ur.HasReference = 1
    /* Optional:  remove referenced tables; if you have a 1:1 reference, leave this bit out */
    and not exists
    (
        select
            *
        from
            sys.foreign_keys fk
            inner join viewcolumns vc 
                on fk.parent_object_id = vc.referenced_id
        where
            vc.referenced_minor_id = 0
            and fk.referenced_object_id = ur.object_id

    );

For SQL 2008, the is_selected flag is not available, so there's no way to tell if the column returned by that function is actually part of the SELECT clause or if it is is used in a join or filter. With SQL 2012, you could at least limit your query to the columns that actually are part of the SELECT clause.

What you get from this is not a set of unique keys for the view. What you get is a set of columns which make up unique keys on their underlying tables. The difference is that you could have a reference table with a unique key constraint on the Name column, and that Name column would show up in the above query even if the view joins the reference table to the base table (thereby causing repeated use of the reference table's Name column). To help alleviate that, I have a NOT EXISTS clause which removes cases in which the object is the referenced table in a foreign key relationship with another table in the view, so our unique index for the reference data table should not show up.

What this does allow you to do is reduce your possible answer space. But even then, you'll be doing a lot of spadework. The more complex your views get, the less valuable this is. For example, if you have a UNION ALL in your query, the statement above might show you a candidate column set which is wrong, because those columns might be duplicated in the other half of the UNION ALL. Or if you have cross-server queries, sys.dm_sql_referenced_entities might not even show you any column names. In other words, the query above is a semi-functional aid and certainly not a method of automating the process.

Note that this does depend upon having unique constraints specified. If your only unique keys are surrogate primary keys, it might not be quite as easy to find a candidate column set because not even SQL Server knows that the column combination is supposed to be unique.

Best Answer

Related Solutions

Sql-server – Column definition for a SQL Server equivalent to Access BOOLEAN type

Sql-server – Determining whether column in view can only contain unique values

Related Question