Sql-server – Confusion about CPU time and Elapsed Time Calculation

optimizationperformancequery-performancesql server

I have already searched on different online forums already, but I haven't got any clear answer on this.

The problem is I have a query which has CPU time greater 4 times greater than Elapsed time. I also have a query in which Elapsed time is greater than CPU time. I am not able to understand why is it so. Can you please shed some light, what actually happens in both cases?

An example is my query has CPU Time 4000ms while Elapsed time is just 1000ms and one query has elapsed time 2ms while CPU time is 0ms.

According to following post:

https://stackoverflow.com/questions/6419312/sql-server-query-tuning-why-cpu-time-is-higher-than-elapsed-time-are-they-rel

higher CPU time and lower elapsed time is due to hyper-threading or multi cores utilization by SQL server but I am not able to find any authentic information regarding that. Another confusion is the reverse case. How is it possible that elapsed time is for example 4ms while CPU time is 0ms?

Any detailed answer regarding this would be highly appreciated.

Best Answer

If a query has more CPU time than elapsed time that means that it contained a zone that executed in parallel. The ratio of CPU Time to elapsed time depends on many factors including the number of CPUs available to the query, what the query waited on and how long those waits were (partially depends on server workload), and how efficiently the query is able to take advantage of parallelism. For an example of the last point reference Amdahl's Law:

For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour.

Many queries will have zones that run in parallel and zones that run in serial. If most of the work is done in the parallel zones and the work is evenly distributed among schedulers then you might see a high CPU time to elapsed time ratio.

Below is a bit of a contrived example of a query that can parallelize extremely efficiently in SQL Server. First note that the machine I'm testing against has 4 cores and all of them are available to the query. From Task Manager:

task monitor

From SQL Server:

SELECT cpu_count -- result of 4
FROM sys.dm_os_sys_info;

Now I'll put 100 rows into one temp table and 100k rows into the other:

CREATE TABLE #OUTER_TABLE (ID INT NOT NULL);

-- insert 100 integers from 1 - 100
INSERT INTO #OUTER_TABLE
SELECT N
FROM dbo.getNums(100);
    
CREATE TABLE #INNER_TABLE (COL VARCHAR(8000));

-- insert 100000 rows
INSERT INTO #INNER_TABLE
SELECT REPLICATE('Z', 800)
FROM dbo.getNums(100000);

SET STATISTICS TIME ON;

Here is the query that I'm going to test with:

SELECT ot.ID, it.COL
FROM 
(
    SELECT TOP 100 ID
    FROM #OUTER_TABLE
) ot
CROSS APPLY (
    SELECT MAX(COL) COL
    FROM #INNER_TABLE
    WHERE COL <> CAST(ot.ID AS VARCHAR(8000))
) it
 -- TF 8649 is only used for demonstration purposes and should not be used in production
OPTION (MAXDOP 4, QUERYTRACEON 8649);

It's a bit of a mess so let's walk through it. Based on the data in the tables that query will return all 100 rows from #OUTER_TABLE along with any value from #INNER_TABLE. However, I have not made this information available to the query optimizer so it will do a lot of work to get the result that we're expecting. Overall the query is silly but what's important is how the query executes. Here's a screenshot:

actual query plan

I'm going to skip over why I wrote the query that way because it isn't relevant to the question. However, one thing to note is that the 100 rows from #OUTER_TABLE are split into four threads with 25 rows per thread. Each thread does a full scan of #INNER_TABLE per row, so 4 threads do 25 scans each. The part of the query on the inner part of the loop does almost all of the work as you might expect (the outer part scans 100 rows but the inner part scans 100 * 100000 = 10000000 rows).

When I run the query with MAXDOP 4 (using all four cores) I get the following results from SET STATISTICS TIME ON:

SQL Server Execution Times:

CPU time = 24500 ms, elapsed time = 6357 ms.

Running with MAXDOP 2 (using just two cores) gives the following results:

SQL Server Execution Times:

CPU time = 23422 ms, elapsed time = 11864 ms.

Running with MAXDOP 1 (not in parallel) gives the following results:

SQL Server Execution Times:

CPU time = 23313 ms, elapsed time = 23361 ms.

As you can see, the query can very efficiently take advantage of parallelism. Most queries in the wild will not have such a relatively high CPU time to elapsed time ratio (after factoring in DOP).

For an example of when the elapsed time can be greater than CPU time consider the time that it takes to send the results to a client, such as to a SQL Server Management Studio results grid. The time to send those results will be included in the elapsed time but won't factor that heavily into CPU time. The work that the server does to read the results from memory or disk will contribute to both CPU time and elapsed time. The time spent sending results to the client is one type of wait event (usually ASYNC_NETWORK_IO). There are many different reasons why a SQL Server query could be waiting and not all of them affect CPU utilization in the same way.

Suppose I run this query:

SELECT * FROM INNER_TABLE;

Getting the results to the client results in a CPU time of 94 ms and an elapsed time of 749 ms. I can use sys.dm_exec_session_wait_stats to see the wait events for the session:

╔═══════════════════════╦═════════════════════╦══════════════╦══════════════════╦═════════════════════╗
║       wait_type       ║ waiting_tasks_count ║ wait_time_ms ║ max_wait_time_ms ║ signal_wait_time_ms ║
╠═══════════════════════╬═════════════════════╬══════════════╬══════════════════╬═════════════════════╣
║ ASYNC_NETWORK_IO      ║               19444 ║          662 ║               12 ║                  67 ║
║ MEMORY_ALLOCATION_EXT ║                  37 ║            0 ║                0 ║                   0 ║
╚═══════════════════════╩═════════════════════╩══════════════╩══════════════════╩═════════════════════╝

I can also run the query in a different session without returning the result set to the client. In one test this took 47 ms of CPU time and 60 ms of elapsed time. Here are the wait sessions from that query which are quite different from the other one:

╔═══════════════════════╦═════════════════════╦══════════════╦══════════════════╦═════════════════════╗
║       wait_type       ║ waiting_tasks_count ║ wait_time_ms ║ max_wait_time_ms ║ signal_wait_time_ms ║
╠═══════════════════════╬═════════════════════╬══════════════╬══════════════════╬═════════════════════╣
║ ASYNC_NETWORK_IO      ║                 185 ║            2 ║                0 ║                   1 ║
║ SOS_SCHEDULER_YIELD   ║                   7 ║            0 ║                0 ║                   0 ║
║ MEMORY_ALLOCATION_EXT ║                  40 ║            0 ║                0 ║                   0 ║
╚═══════════════════════╩═════════════════════╩══════════════╩══════════════════╩═════════════════════╝