Sql-server – MSSQL: High number of logical reads when inserting into fragmented heap

fragmentationheapprofilersql server

I'm inserting a batch of rows – using INSERT INTO tblUploadBanking SELECT FROM ... – into a table, which is a heap and have 3 narrow NC indexes.

When I execute the query from SSMS with SET STATISTICS IO ON, I can see output like this:

Table 'tblUploadBanking'. Scan count 0, logical reads 709738, physical reads 493, read-ahead reads 429, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 3, logical reads 136234, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tblTrans_Cash_HO'. Scan count 1, logical reads 96944, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tblPaymentType'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'tblWorkstation_Group'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

When I sum the number of logical reads, I'm getting 942922 reads. But, when I look into the output from profiler, I can see for this query 1475067 logical reads, which is 532k more than the sum from statistics output. BTW, from Extended Events I'm getting same number like from profiler.

What are these hidden reads? The only thing I have found out so far is that there is a connection with heap and indexes fragmentation. When I rebuild the heap, the execution plan remains exactly the same – including cost distribution, rows estimations etc. The number of reads in statistics is about the same as well, but in the profiler the number of logical reads has dropped to 1007183, which is almost matching with the output from statistics. Also the CPU time has reduced by 25%.

With other queries I had even bigger discrepancies – for few sessions I saw in trace 6-10M reads, but according to statistics there were just about 900k of logical reads. These queries were also inserts into fragmented heap and rebuilding the heap "solved it" as well like in this case.

From my knowledge about the heaps, there shouldn't be any reason for slower inserts when the heap is fragmented. It could be because of fragmented NC indexes, but I would expect that logical reads needed for index insert are included in statistics for the given table.

PS: Please do not suggest me to turn table from heap to clustered table, it's a plan for the future which is not possible now.

Best Answer

When inserting into a table without specifying TABLOCK hint the operation is fully logged (every row insert is logged) and the number of logical reads reported by query statistics is roughly equal to the number of inserted rows; so it's the same in case of fragmented/not fragmented heap (1 insert = 1 destination table page access), but what is not counted here is PFS-pages access, and the Profiler does count it as logical reads.

When you insert into rebuilt heap, there is no need to try to insert in any existing page because they are all full, but when inserting to a fragmented heap there is free space in the heap and it should be checked for every row inserted (where it's possible to insert it).

IMHO, the fact that there is no more need to search in PFS pages is revealed at the first attempt to find a free space in the existing pages when all pages allocated for this heap are checked without success for a rebuilt heap, and the number of possible accesses the PFS pages grows as the fragmentation grows. As soon as the searching for free space failes when trying to insert the Nth row, for the remaining rows it allocates new pages directly

Related Solutions

Sql-server – Does MERGE use tempdb

^{(Expanding on my comment on the question.)}

Without a unique constraint on the combination of AreaId and ParameterTypeId, the given code is broken because @UpdatedId = target.Id will only ever record a single row Id.

Unless you tell it so, SQL Server can't implicitly know the possible states of the data. Either the constraint should be enforced, or if multiple rows are valid, the code will need to be changed to use a different mechanism to output the Id values.

Because of the possibility that the scan operator will come across multiple matching rows, the query must eager spool all the matches for Halloween protection. As indicated in the comments, the constraint is valid, so adding it will not only change the plan from a scan to a seek, but also eliminate the need for the table spool, as SQL Server will know there is going to be either 0 or 1 rows returned from the seek operator.

Sql-server – Procedure showing different duration / reads in Profiler vs. SSMS

The difference in logical reads leads me to believe that the two executions are getting different plans. This could be because:

there are different set settings between the app that is calling the procedure that profiler is picking up and your instance of SSMS.
you have a different default schema (you really should say exec dbo.SM_SP...).
see this post for more details

You can see if there are two or more different plans for the procedure using something like this, which will also identify if this is caused by either (or both) of the two most likely factors for different plan choice:

SELECT p.plan_handle, p.usecounts, p.size_in_bytes, 
  set_options = MAX(CASE a.attribute WHEN N'set_options' THEN a.value END),
  dflt_schema = MAX(CASE a.attribute WHEN N'user_id'     THEN a.value END)
FROM sys.dm_exec_cached_plans AS p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
CROSS APPLY sys.dm_exec_plan_attributes(p.plan_handle) AS a
WHERE t.objectid = OBJECT_ID(N'dbo.SM_SP_MAILEVENTNAMES')
AND a.attribute IN (N'user_id', N'set_options')
GROUP BY p.plan_handle, p.usecounts, p.size_in_bytes;

Best Answer

Related Solutions

Sql-server – Does MERGE use tempdb

Sql-server – Procedure showing different duration / reads in Profiler vs. SSMS

Related Question