It's really a situational thing that you want to look at per-table or per-index, and you really need to find out what's in production before taking any action. When in doubt, use what's in production in the other environments, too, even if it means using a bunch of crazy settings. You just can't get a good feel for how production will behave if things are different in test or dev.
Anyway, the general recommendation to leave auto-update stats turned on (STATISTICS_NORECOMPUTE = OFF
, which is the default) is for safety reasons, because if this is turned off and nothing is manually updating the stats, the result could be really horrendous execution plans that never change after they're first created (and don't get invalidated for other reasons later on).
You said auto-update stats is turned off for most indexes (I think I originally misread that as all, not most). For the indexes with auto-update stats still enabled, does this setting make sense given the activity on those tables? I would expect that these are higher-activity tables. It's possible a lot of work went into figuring that out, and it may be worth keeping (or strongly considering) those settings. At the very least, make a note of which stats these are, because that information could come in handy down the road.
Thinking about it more, I will say that the current strategy does make sense. Is it better than leaving auto-update stats on for everything? It seems someone thought so, to the point that it was worth the ease-of-management tradeoff of having an associated SQL Agent job.
If the idea was to have fresh stats available without blocking queries (like this), you could consider turning auto-update back on for everything, and then also turn on AUTO_UPDATE_STATISTICS_ASYNC
as well. Then probably change the job schedule to run once/week instead of daily, as you still do want stats updated WITH FULLSCAN
periodically.
I might just leave it, though, as you probably have bigger fish to fry if the indexes themselves are different between environments, and the stats rebuilds aren't too painful. What's there now does make sense; you just need to make things consistent across environments. It's probably marginally better than the simpler settings I suggested, at the expense of more work being done. But find out what's in production, tend towards using that, and move on to more important things; revisit this when you're at the point of needing to more finely tune performance -- the best stats in the world won't save a query that's missing a critical index.
You question revolves around - When is it a good thing to just create statistics vs create index (which create stats).
From my sql server internals notes (SQLSkills class- IE1 and IE2) and SQL Server internals book, below is my limited understanding :
SQL Server statistics are nothing but system objects that contain vital information about the index key values and regular column values.
SQL Server uses a cost based model to choose a "good enough" execution plan as fast as possible. Cardanility estimation (estimating no. of rows to be processed on each step of the
query execution) is the most important factor in query optimization which inturn affects the join strategy, memory grant requirement, worker thread selection
as well as choice of indexes when accessing data.
SQL Server wont use nonclustered indexes when it estimates that a large no. of KEY or RID loopup operations will be required, so it maintains statistics on indexes (and on columns) which
will help in such estimations.
There are 2 important things about stats :
Histogram stores info about data distribution for the leftmost statistics (index) column ONLY. It also stores info about multi column density of the key values. So essentially,
histogram stores data distribution for the leftmost statistics column only.
SQL Server will retain at most 200 steps in histogram irrespective of table size. The intervals covered by each histogram step increase as the table grows which leads to "less accurate" stats for large tables.
Remember that index selectivity is a metric which is inversely proportional to density i.e the more unique values a column has, the higher its selectivity.
When particular queries do not run very often, you can select to create column-level statistics rather than an index. Column-level statistics help
Query Optimizer find better execution plans, even though those execution plans are suboptimal due to the index
scans involved. At the same time, statistics do not add an overhead during data modification operations, and they
help avoid index maintenance. This approach works only for rarely executed queries.
Refer :
Note: Someone like Paul White or Aaron Bertrand can chime in to provide more color to your good question.
Best Answer
Indexes store actual data (data pages or index pages depending on the type of index we are talking about), and Statistics store data distribution. Therefore,
CREATE INDEX
will be the DDL to create an index (clustered, nonclustered, etc.) andCREATE STATISTICS
is the DDL to create the statistics on columns within the table.I recommend you read about these aspects of relational data. Below are a couple of beginner, introductory articles. These are very broad topics, and therefore the information on them can go very wide and very deep. Read up on the general idea of them below, and ask more specific questions when they arise.
BOL reference on Table and Index Organization
BOL reference on Clustered Index Structure
BOL reference on Nonclustered Index Structures
SQL Server Central on the Introduction to Indexes
BOL reference on Statistics
Here is a working example to see these two parts in action (commented to explain):
Here is what a test sample of statistics can look like:
Notice that Statistics are the containment of the data distribution. They help SQL Server determine an optimal plan. A good example of this is, imagine you are going to life a heavy object. If you knew how much that weight because there was a weight marking on it, you'd determine the best way to lift and with what muscles. That's sort of what SQL Server does with statistics.
We can see from the example above that the index actually contains data (depending on the type of index, leaf pages will be different).
This post has only shown a very very very brief overview of these two large aspects SQL Server. Both of these could take up chapters, and books. Read some of the references, and then you will have a better grasp.