SQL Server Internals – Difference Between Physical Reads and Read-Ahead Reads

buffer-pooldatabase-internalssql serversql-server-2012

I am trying to understand read-ahead reads, but it seems a bit complicated to me. I searched on the web and got the following:

From Reading Pages (Microsoft documentation):

Read-ahead anticipates the data and index pages needed to fulfill a query execution plan and brings the pages into the buffer cache before they are actually used by the query.

From an answer to Why is 'physical reads' less than 'read-ahead reads' & 'logical reads' in SQL Server for first time execution of query? by huntharo on Stack Overflow:

Physical Read – The query is blocked waiting for the page to be read from disk into the cache for immediate use.

Read-Ahead Read – The page is being read before it blocks the query and is read into the cache as are all reads. Read-Aheads are possible when you are scanning an index, in which case the next leaf pages in the index can assume to be needed and the read can be initiated for them before the query actually says it needs them. This allows the disk to be busy while the db engine is examining the contents of previously fetched pages.

Maybe someone could clarify the above using their own explanation because I can't find a detailed explanation for read-ahead reads.

To set an example, look at statistics io info:

Table 'TestLarge'. Scan count 1, logical reads 159185, physical reads 348, read-ahead reads 159209

Best Answer

A query always reads data from memory (a logical read). Your example query scanning the TestLarge table touched 159,185 8KB memory pages during its execution.

During execution, SQL Server does two things.

1. It reads data from the pages that belong to the table.

If the required page is already in memory, a logical read is recorded.

If the required page is not in memory, a physical read is recorded.

The page is brought into memory from persistent storage.
The query is blocked until this read completes.
This happened 348 times to your test query.
A logical read is also counted when SQL Server processes the page (that is now in memory) to satisfy your query.

2. It issues read-ahead reads.

Every so often during the scanning operation, SQL Server spends a moment managing read-ahead:

SQL Server gathers a list of 8KB pages that may well be encountered by the current operation in the near future. You can think of this as the engine "looking ahead" of the current scan position for the pages that come next. It achieves using IAM (allocation map) pages or b-tree levels above the leaf, depending on the type of scan.
Any pages on this "look-ahead" list that are not already in memory are passed to the operating system in one or more asynchronous read requests. These are counted as read-ahead reads.
The operating system handles reading the pages into SQL Server memory, and notifies SQL Server when the reads are complete.
The SQL Server thread that issued the asynchronous reads is not blocked. It can continue scanning pages that are in memory while the operating system fetches read-ahead pages in the background, on a separate thread.
Your test query read 159,209 pages into memory via the read-ahead mechanism.

An analogy

Imagine there is a book. You are given the index only. The rest of the book is in the local library. The library has a rule that the whole book cannot be checked out, and only 50 pages at most can be taken from the library at each visit.

Your task is to assemble the book at home in the order pages are referenced in the index (a to z). You are not allowed to leave home, but you have a friend that can go to the library on your behalf.

The first entry in the index is for "aardvark", which appears on page 392 of the book.

You realize it will be very inefficient to do this task a page at a time, so instead of sending your friend to the library for page 392, you read 50 entries in index order, and give your friend that list of pages to take to the library. You count 50 read-ahead reads at this point.

Now you turn back to processing "aardvark". You don't have page 392 in front of you, so you have to wait, doing nothing, until your friend gets back. This is a physical read.

When your friend arrives you count a logical read when you process page 392.

You could start on the other 49 pages your friend brought back (counting a logical read for each one), but you realize it will be more efficient if you give your friend another list of pages to fetch from the library while you are busy with the work in front of you.

Each time you send your friend to the library with a list of pages to fetch, you count read-ahead reads. Each time you process a page in front of you, you count a logical read. If you find yourself without the next page you need (because your friend is too slow), you count a physical read.

The overall task completes quicker when you and your friend can overlap your activities effectively. They can be busy fetching pages you will need soon, while you are busy processing the pages you have in front of you. When this works well, you never have to wait for the next page you need, though you do spend a little time telling your friend what to do.

Related Solutions

SQL Server – Increased Scan Count After Creating Non-clustered Index

First scenario, it's scanning the whole table and looking through 14553 pages of data. Second scenario, it's doing 2266 seeks (but counted as range scans) which each look at just 2 pages. So the second one is way better. Plus, many of those seeks will probably be looking at pages which have just been looked at, so on a cold cache it will be an even larger performance benefit. And, the second is more likely to parallelise better, being lots of small operations rather than a large one (which could still be parallelised, but it's more effort).

Multi-Statement TVF vs Inline TVF Performance in SQL Server

Your numbers table is a heap and is potentially being fully scanned each time.

Add a clustered primary key on Number and try the following with a forceseek hint to get the desired seek.

As far as I can tell this hint is needed as SQL Server just estimates that 27% of the table will match the predicate (30% for the <= and reduced down to 27% by the <>). And therefore that it will only have to read 3-4 rows before finding one that matches and it can exit the semi join. So the scan option is costed very cheaply. But in fact if any palindromes do exist then it will have to read the whole table so this is not a good plan.

CREATE FUNCTION dbo.InlineIsPalindrome
(
    @Word NVARCHAR(500)
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN (
    WITH Nums AS
    (
      SELECT
        N = number
      FROM
        dbo.Numbers WITH(FORCESEEK)
    )
    SELECT
      IsPalindrome =
        CASE
          WHEN EXISTS
          (
            SELECT N
            FROM Nums
            WHERE N <= L / 2
              AND SUBSTRING(S, N, 1) <> SUBSTRING(S, 1 + L - N, 1)
          )
          THEN 0
          ELSE 1
        END
    FROM
      (SELECT LTRIM(RTRIM(@Word)), LEN(@Word)) AS v (S, L)
);
GO

With those changes in place it flies for me (takes 228ms)