SQL Server – Understanding and Benefiting from Database Statistics

sql-server-2005statistics

I've heard mention of statistics that SQL Server keeps by default. What are they tracking, and how can I use this information to improve my database?

Best Answer

Statistics simply are a form of dynamic metadata that assists the query optimizer in making better decisions. For example, if there are only a dozen rows in a table, then there's no point going to an index to do a lookup; you will always be better off doing a full table scan. But if that same table grows to a million rows, then you will probably be better off using the index. But if you query that table on a column that only has very few unique values (e.g. it might be a "sex" column containing only "M" or "F"), well actually a FTS might be better that that case because you would need to retrieve the block anyway to build the result set. Now say your table is 99% "M" and only 1% "F", we should FTS in one case or use an index in the other. Same table, same query, potentially four different query plans depending on the contents of the table. These kinds of things are the "statistics" and they are individual to each database - even two databases with identical table and index structure will have different statistics.

In summary, in a modern database engine there are two kinds of query optimization: rewriting the SQL (rule-based optimization, like a compiler rewrites your C to make it more efficient) and choosing the right path the the data (cost-based optimization, like a JIT compiler identifying hotspots at runtime). You only need to worry about this if you spot the query optimizer doing something obviously wrong (e.g. choosing FTS when you know an index would be better).