Sql-server – SQL datatype choice for flagging a read/ unread message

database-designperformancesql server

I have a table where I want to track Boolean type operations. in addition, a business requirement is that I know the date of the action and not just the Boolean value.

for example if a message is read or deleted.

I would like to use a date time field so that I can just quantify all unread messages as having null or a date time value for read. I get Boolean operation and a timestamps.

My only qualm about this is how efficient will selecting all unread messages be?

Lets say in a table with a few million records and a non clustered index on a field such as read would a select perform way worse than if the field were a Boolean?

Best Answer

Performance of a particular design depends on the distribution of data to a large extent, and the access path. Are you expecting most messages to have the flag, or most messages not to? From the docs:

My tests show that a table scan often starts to perform better than a nonclustered index access when at least 10 percent of the rows are selected. I also found that the optimizer switches from nonclustered index access to table scan prematurely (i.e., when nonclustered index access still shows better response time than the corresponding table scan). In many cases, the optimizer forces a table scan for queries with result sets of approximately 5 percent, although the table scan becomes more efficient than index access at selectivities of 8 to 10 percent.

And of course, if there are any other predicates on the query, and the clustered index if any. For example, are you likely to want to access all unread messages within a certain timeframe? Or sent to/from a particular user? How big are the message bodies and are they stored inline? These are just rhetorical questions, mind.

So which is better, a BIT and a DATETIME or just a DATETIME for performance? You will have to benchmark with some representative data and access patterns to find out. I'm afraid this one can't be answered with just theory. But you can of course create the table with the boolean in and just not use it if you find the latter is better - it only adds an overhead of 1 byte per row.

If you were asking the opposite question, how to efficiently query the not-NULL rows, I would have said use a filtered index.

Related Solutions

Sql-server – Optimising join on large table

Your ix_hugetable looks quite useless because:

it is the clustered index (PK)
the INCLUDE makes no difference because a clustered index INCLUDEs all non-key columns (non-key values at lowest leaf = INCLUDEd = what a clustered index is)

In addition: - added or fk should be first - ID is first = not much use

Try changing the clustered key to (added, fk, id) and drop ix_hugetable. You've already tried (fk, added, id). If nothing else, you'll save a lot of disk space and index maintenance

Another option might be to try the FORCE ORDER hint with table order boh ways and no JOIN/INDEX hints. I try not to use JOIN/INDEX hints personally because you remove options for the optimiser. Many years ago I was told (seminar with a SQL Guru) that FORCE ORDER hint can help when you have huge table JOIN small table: YMMV 7 years later...

Oh, and let us know where the DBA lives so we can arrange for some percussion adjustment

Edit, after 02 Jun update

The 4th column is not part of the non-clustered index so it uses the clustered index.

Try changing the NC index to INCLUDE the value column so it doesn't have to access the value column for the clustered index

create nonclustered index ix_hugetable on dbo.hugetable (
    fk asc, added asc
) include(value)

Note: If value is not nullable then it is the same as COUNT(*) semantically. But for SUM it need the actual value, not existence.

As an example, if you change COUNT(value) to COUNT(DISTINCT value) without changing the index it should break the query again because it has to process value as a value, not as existence.

The query needs 3 columns: added, fk, value. The first 2 are filtered/joined so are key columns. value is just used so can be included. Classic use of a covering index.

Sql-server – How to manage 3.1 billion rows of data

I found this analysis very useful when I was researching on building an analytics solution which would have billions of rows in one table.

http://leiliweb.wordpress.com/2012/12/11/partitioned-table-and-index-strategies-using-sql-server-2008/

Best Answer

Related Solutions

Sql-server – Optimising join on large table

Sql-server – How to manage 3.1 billion rows of data

Related Question