DB2 – Definition of ROWS_READ – Unix Server Solutions

This may seems like a stupid question, but I cant seem to find a good answer for it. I have a pretty good intuitive understanding of what ROWS_READ in for example sysibmadm.snapdyn_sql means, and what conclusions can be drawn from this metric. However, I'm not sure exactly what this means, and I can't find any information in the documentation. Looking at:

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.mon.doc/doc/r0001317.html

it says: The number of rows read from the table.

My question is due to observations that sometimes ROWS_READ does not increase eventhough NUM_EXECUTION does. At the same time there are situations where ROWS_READ increases along with NUM_EXECUTIONS eventhough data definitely fits in bufferpool.

Are ROWS_READ physical reads (from disk to bufferpool) or logical reads (in bufferpool) or orthogonal to physical/logical? If the metric is measured in the bufferpool, how can the first phenomena be explained? Invalidated package (something wrong with counters in package cache), a sudden change to index only access or something else?

Best Answer

Firstly, rows_read are logical. The metric shows the number of rows the engine had to retrieve (from the buffer pool if available) and evaluate against SARGable predicates, if any. It is in contrast with the rows_selected metric that shows how many rows matched all query predicates and were therefore returned to the application.

If your query does not use any indexes, rows_read will in general be equal to the total number of rows in all tables that are involved. If an index is used, rows_read will be reduced to the number of rows that can be found matching whatever predicates by the index scan.

Subsequently, if a unique index scan does not find any matching key values, rows_read for that particular table will be 0. Other optimization techniques, such as the use of referential integrity and check constraints or partition elimination, might also allow the engine to satisfy the query without actually evaluating any rows. For example, if you have an int_column integer check (int_column > 0) and a query with where int_column < 0, the engine doesn't need to read any rows to satisfy that query.

The ratio of rows_read to rows_selected is one of the general measures of efficiency of your SQL queries; the lower it is, the better: it means the server does not waste time reading rows that do not match query predicates.