Defragmentation strategies help improve scan speed to/from disk.
The wide variety of opinions is because an environment's ideal defragmentation strategy should depends on many different factors. There are also multiple potential layers of fragmentation in play.
Saying that your databases are stored on a SAN isn't enough information. For example:
Are database files stored on separate physical RAID groups or the same RAID group? What other processes are active on that same device? Are your backup files ending up there, too? You may have to ask your SAN admin for this information, because it's not always transparent.
What are the access patterns for the databases? OLTP is generally random access, but sometimes an application is table-scan-happy and you can't change its behaviour (ISV app). Are the applications read-mostly, write-mostly, or somewhere in between?
Are there performance SLAs in play during a recovery/failover period?
Brent's post assumes there is one giant pool of storage and everything shares it. This means the physical disks are rarely idle, and hence most access is random. If that is your situation, then the advice applies, and I agree with it for the most part. While this type of strategy is much easier to manage, it isn't necessarily (a) what you have in your environment, or (b) what is the best solution for your environment.
If index maintenance is burdensome, consider doing it less aggressively, and/or amortize the cost over the week (i.e., run light maintenance once/day, instead of heavy maintenance once/week).
You can also turn on the SortInTempdb
option to potentially reduce the amount of logging that takes place in the user databases.
A perfect query optimizer should make the best plan if provided with a full sample of statistics. But even a perfect optimizer does not need the information provided by a full sample of statistics to make the best plan. A simple example:
select count(*)
from tab1
where col1=3
and col2=4
Assume that for this table tab1 there exists a nonclustered index on col1. The optimizer has to possible plans:
1) scan the whole table and check for each row if col1=3 and col2=4
2) consult the index to find the position of all rows in the table with col1=3. Lookup each of these row in the table and check if col2=4
now assume the table has a size of 123,456 GB and 1 123 456 789 rows and 567 890 123 of them (about half of them) have col1=3
. What should the optimizer propose? It schould propose a full table scan. The other possibility, looking up 567 890 123 rows one by one is too expensive.
Now assume the optimizer only has a rough estimate of the table data: the table size is about 100GB, there are about 1,000,000,000 rows and about 600,000,000 of them have col1=3
. Based on this information the optimizer will choose the same good plan as in the first case.
The decision is based on the size of the ratio 100*(# of col1=3-rows)/#(all rows)
and if this value is larger 10% (I don't know which value is really used) the full table is scanned. The decision is not based of the size of the 5th place after the decimal point of the ration.
In a special case where the exact ratio is 10,00001% and the ratio based on the rough estimate is 9,99999% the optimizer will choose different plans but maybe for these different plans the execution time may be the same for both plans or may be better for the plan based on the rough estimate
Administrative jobs like calculationg statitistics should be done only with least resource consumption that is possible. Therefore only the precision that is necessary should be tried to be reached.
Best Answer
For
WITH SAMPLE 50 PERCENT
it works as though for each data page in the table SQL Server flips a coin. If it lands heads then it reads all the rows on the page. If it lands tails then it reads none.Tracing the
UPDATE STATISTICS T WITH SAMPLE 50 PERCENT
call in Profiler shows the following query is emittedWith Plan
TABLESAMPLE SYSTEM (5.000000e+001 PERCENT)
is responsible for the sampling and is documented hereThe documentation also states
The
STATMAN
call is to an internal aggregate function briefly described here