Sql-server – Confusion about CPU time and Elapsed Time Calculation

optimizationperformancequery-performancesql server

I have already searched on different online forums already, but I haven't got any clear answer on this.

The problem is I have a query which has CPU time greater 4 times greater than Elapsed time. I also have a query in which Elapsed time is greater than CPU time. I am not able to understand why is it so. Can you please shed some light, what actually happens in both cases?

An example is my query has CPU Time 4000ms while Elapsed time is just 1000ms and one query has elapsed time 2ms while CPU time is 0ms.

According to following post:

https://stackoverflow.com/questions/6419312/sql-server-query-tuning-why-cpu-time-is-higher-than-elapsed-time-are-they-rel

higher CPU time and lower elapsed time is due to hyper-threading or multi cores utilization by SQL server but I am not able to find any authentic information regarding that. Another confusion is the reverse case. How is it possible that elapsed time is for example 4ms while CPU time is 0ms?

Any detailed answer regarding this would be highly appreciated.

Best Answer

If a query has more CPU time than elapsed time that means that it contained a zone that executed in parallel. The ratio of CPU Time to elapsed time depends on many factors including the number of CPUs available to the query, what the query waited on and how long those waits were (partially depends on server workload), and how efficiently the query is able to take advantage of parallelism. For an example of the last point reference Amdahl's Law:

For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour.

Many queries will have zones that run in parallel and zones that run in serial. If most of the work is done in the parallel zones and the work is evenly distributed among schedulers then you might see a high CPU time to elapsed time ratio.

Below is a bit of a contrived example of a query that can parallelize extremely efficiently in SQL Server. First note that the machine I'm testing against has 4 cores and all of them are available to the query. From Task Manager:

From SQL Server:

SELECT cpu_count -- result of 4
FROM sys.dm_os_sys_info;

Now I'll put 100 rows into one temp table and 100k rows into the other:

CREATE TABLE #OUTER_TABLE (ID INT NOT NULL);

-- insert 100 integers from 1 - 100
INSERT INTO #OUTER_TABLE
SELECT N
FROM dbo.getNums(100);
    
CREATE TABLE #INNER_TABLE (COL VARCHAR(8000));

-- insert 100000 rows
INSERT INTO #INNER_TABLE
SELECT REPLICATE('Z', 800)
FROM dbo.getNums(100000);

SET STATISTICS TIME ON;

Here is the query that I'm going to test with:

SELECT ot.ID, it.COL
FROM 
(
    SELECT TOP 100 ID
    FROM #OUTER_TABLE
) ot
CROSS APPLY (
    SELECT MAX(COL) COL
    FROM #INNER_TABLE
    WHERE COL <> CAST(ot.ID AS VARCHAR(8000))
) it
 -- TF 8649 is only used for demonstration purposes and should not be used in production
OPTION (MAXDOP 4, QUERYTRACEON 8649);

It's a bit of a mess so let's walk through it. Based on the data in the tables that query will return all 100 rows from #OUTER_TABLE along with any value from #INNER_TABLE. However, I have not made this information available to the query optimizer so it will do a lot of work to get the result that we're expecting. Overall the query is silly but what's important is how the query executes. Here's a screenshot:

I'm going to skip over why I wrote the query that way because it isn't relevant to the question. However, one thing to note is that the 100 rows from #OUTER_TABLE are split into four threads with 25 rows per thread. Each thread does a full scan of #INNER_TABLE per row, so 4 threads do 25 scans each. The part of the query on the inner part of the loop does almost all of the work as you might expect (the outer part scans 100 rows but the inner part scans 100 * 100000 = 10000000 rows).

When I run the query with MAXDOP 4 (using all four cores) I get the following results from SET STATISTICS TIME ON:

SQL Server Execution Times:

CPU time = 24500 ms, elapsed time = 6357 ms.

Running with MAXDOP 2 (using just two cores) gives the following results:

SQL Server Execution Times:

CPU time = 23422 ms, elapsed time = 11864 ms.

Running with MAXDOP 1 (not in parallel) gives the following results:

SQL Server Execution Times:

CPU time = 23313 ms, elapsed time = 23361 ms.

As you can see, the query can very efficiently take advantage of parallelism. Most queries in the wild will not have such a relatively high CPU time to elapsed time ratio (after factoring in DOP).

For an example of when the elapsed time can be greater than CPU time consider the time that it takes to send the results to a client, such as to a SQL Server Management Studio results grid. The time to send those results will be included in the elapsed time but won't factor that heavily into CPU time. The work that the server does to read the results from memory or disk will contribute to both CPU time and elapsed time. The time spent sending results to the client is one type of wait event (usually ASYNC_NETWORK_IO). There are many different reasons why a SQL Server query could be waiting and not all of them affect CPU utilization in the same way.

Suppose I run this query:

SELECT * FROM INNER_TABLE;

Getting the results to the client results in a CPU time of 94 ms and an elapsed time of 749 ms. I can use sys.dm_exec_session_wait_stats to see the wait events for the session:

╔═══════════════════════╦═════════════════════╦══════════════╦══════════════════╦═════════════════════╗
║       wait_type       ║ waiting_tasks_count ║ wait_time_ms ║ max_wait_time_ms ║ signal_wait_time_ms ║
╠═══════════════════════╬═════════════════════╬══════════════╬══════════════════╬═════════════════════╣
║ ASYNC_NETWORK_IO      ║               19444 ║          662 ║               12 ║                  67 ║
║ MEMORY_ALLOCATION_EXT ║                  37 ║            0 ║                0 ║                   0 ║
╚═══════════════════════╩═════════════════════╩══════════════╩══════════════════╩═════════════════════╝

I can also run the query in a different session without returning the result set to the client. In one test this took 47 ms of CPU time and 60 ms of elapsed time. Here are the wait sessions from that query which are quite different from the other one:

╔═══════════════════════╦═════════════════════╦══════════════╦══════════════════╦═════════════════════╗
║       wait_type       ║ waiting_tasks_count ║ wait_time_ms ║ max_wait_time_ms ║ signal_wait_time_ms ║
╠═══════════════════════╬═════════════════════╬══════════════╬══════════════════╬═════════════════════╣
║ ASYNC_NETWORK_IO      ║                 185 ║            2 ║                0 ║                   1 ║
║ SOS_SCHEDULER_YIELD   ║                   7 ║            0 ║                0 ║                   0 ║
║ MEMORY_ALLOCATION_EXT ║                  40 ║            0 ║                0 ║                   0 ║
╚═══════════════════════╩═════════════════════╩══════════════╩══════════════════╩═════════════════════╝

Related Solutions

Sql-server – Comparing two queries in SQL Server 2012

I love your approach to careful consideration to query tuning and reviewing options and plans. I wish more developers did this. One caution would be - always test with a lot of rows, looking at the logical reads, this is a smallish table. Try and generate a sample load and run the query again. One small issue - in your top query you are not asking for an order by, in your bottom query you are. You should compare and contrast them each with ordering.

I just quickly created a SalesOrders table with 200,000 sales orders in it - still not huge by any stretch of the imagination. And ran the queries with the ORDER BY in each. I also played with indexes a bit.

With no clustered index on OrderID, just a non-clustered index on CustID The second query outperformed. Especially with the order by included in each. There was twice as many reads on the first query than the second query, and the cost percentages were 67% / 33% between the queries.

With a clustered index on OrderID and a non-clustered index just on CustID They performed in a similar speed and the exact same number of reads.

So I would suggest you increase the number of rows and do some more testing. But my final analysis on your queries -

You may find them behaving more similarly than you realize when you increase the rows, so keep that caveat in mind and test that way.

If all you ever want to return is the maximum OrderID for each Customer, and you want to determine that by the OrderID being the greatest OrderID then the second query out of these two is the best way to go from my mindset - it is a bit simpler and while ever so slightly more expensive based on subtree cost it is a quicker and easier to decipher statement. If you intend on adding other columns into your result set someday? Then the first query allows you do to do that.

Updated: One of your comments under your question was:

Please keep in mind, that finding the best query in this question is a means of refining the techniques used for comparing them.

But best takeaway for doing that- test with more data - always makes sure you have data consistent with production and expected future production. Query plans start looking data when you give more rows to the tables, and try and keep the distribution what you'd expect in production. And pay attention to things like including Order By or not, here I don't think it makes a terrible bit of difference in the end, but still worth digging into.

Your approach of comparing this level of detail and data is a good one. Subtree costs are arbitrary and meaningless mostly, but still worth at least looking at for comparison between edits/changes or even between queries. Looking at the time statistics and the IO are quite important, as is looking at the plan for anything that feels out of place for the size of the data you are working with and what you are trying to do.

Sql-server – Improve query performance when selecting almost all rows with many “group by” columns

The non-clustered index you have tested is not the best for this query. It can be used for the WHERE clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY.

The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE clause), then have all the columns used in the GROUP BY and then INCLUDE all the other columns used in the SELECT:

CREATE INDEX special_ix 
  ON dbo.Commissions_Output
    ( company, location, account, 
      salesroute, employee, producttype, 
      item, loadjdate, commissionrate ) 
INCLUDE 
  ( [Extended Sales Price], [Delivered Qty] ) 
WHERE 
  ( [Extended Sales Price] <> 0 ) ;

Best Answer

Related Solutions

Sql-server – Comparing two queries in SQL Server 2012

Sql-server – Improve query performance when selecting almost all rows with many “group by” columns

Related Question