Sql-server – Does number of rows affect Update Statistics performance if you are sampling with n rows

sql serversql server 2014statistics

For example, I have table A with 80 million rows and table B with 200 million rows.

Assume both tables have 30+ stat tables (all manually generated by SQL Server).
I randomly take the first 10 stat tables from each and run the two tasks separately:

Update Statistics A (stat1, stat2, ...) with sample 5000000 rows
Update Statistics B (stat1, stat2, ...) with sample 5000000 rows

Even though both are scanning 5000000 rows only, the second task is taking much longer to complete. Is this difference in performance solely from the difference in the contents of the tables, or does the number of rows in the table also play a part even if we are limiting the number of rows to scan?

Best Answer

Update statistics is a single threaded task (You can run multiple instances of it in parallell though) so duration of the statistics run will be correlated with the number of records the task has to scan in order to build the statistics and selecting the rows to sample.

Even if you are only sampling a fixed number of rows the population you are sampling from is larger therefore the scan will be more time consuming.

There are some interesting details about effects and use of tempdb in statistics scanning in Erin Stellato's post about statistics but almost all you want to know is contained in this Robert Sheldon's article