Sql-server – What’s the easiest and most accurate way to visualize parallel thread usage in SQL Server

parallelismsql server

Following up on my other question here, I'd like an easy way to visualize the answer.

I've tried using a query like this, but the number of "running" threads can exceed DOP sometimes.

SELECT    ost.session_id,
          ost.exec_context_id,
          ost.scheduler_id,
          qp.node_id,
          w.worker_address,
          qp.physical_operator_name,
          ost.task_state,
          qp.row_count
FROM      sys.dm_os_tasks AS ost
JOIN      sys.dm_os_workers AS w
    ON ost.worker_address = w.worker_address
LEFT JOIN sys.dm_exec_query_profiles AS qp
    ON w.task_address = qp.task_address
WHERE     ost.session_id = 57
AND       ost.task_state = 'RUNNING'
ORDER BY  ost.exec_context_id, ost.scheduler_id, qp.node_id, w.worker_address;

I've also tried using the query_thread_profile Extended Event, but for a plan with multiple parallel branches, it shows the same N number of Thread IDs. This seems contrary to the answer given, since DOP doesn't explicitly limit the total number of threads used. For a query running at DOP 4, it looks like this. I'm sure that the same four threads (plus coordinator) don't get used for all branches and operators.

xe_session

So like, running a query like this at DOP 8 gives me three parallel branches.

SELECT COUNT(DISTINCT t1.Data)
FROM dbo.t1 
JOIN dbo.t2 
ON t1.Id = t2.f_Id
JOIN t3
ON t1.Id = t3.f_Id
OPTION(FORCE ORDER, HASH JOIN);

But sometimes my query returns this:

query1

And sometimes it returns this:

query_2

Here's some sample data:

CREATE TABLE dbo.t1 ( Id INT IDENTITY PRIMARY KEY CLUSTERED, Data INT NOT NULL );

CREATE TABLE dbo.t2 ( Id INT IDENTITY PRIMARY KEY CLUSTERED, f_Id INT NOT NULL );

CREATE TABLE dbo.t3 ( Id INT IDENTITY PRIMARY KEY CLUSTERED, f_Id INT NOT NULL );

INSERT dbo.t1 WITH (TABLOCKX) ( Data )
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 2000
FROM (
SELECT TOP 1000000 1 AS num
FROM master..spt_values AS sv
CROSS JOIN master..spt_values AS sv2
CROSS JOIN master..spt_values AS sv3
) AS nums

INSERT dbo.t2 WITH (TABLOCKX) ( f_Id )
SELECT TOP 15000000 t.Id
FROM dbo.t1 AS t
CROSS JOIN dbo.t1 AS t2

INSERT dbo.t3 WITH (TABLOCKX) ( f_Id )
SELECT TOP 25000000 t.Id
FROM dbo.t1 AS t
CROSS JOIN dbo.t1 AS t2

Thanks!

Best Answer

I've tried using a query like this, but the number of "running" threads can exceed DOP sometimes.

The join from dm_os_tasks to dm_os_workers is one to one, because a worker thread runs exactly one task. The join from tasks/workers to dm_exec_query_profiles is one to many, because each thread may run multiple plan nodes.

You need to look at the number of unique tasks/execution contexts/schedulers that are associated with RUNNING status:

SELECT
    COUNT(DISTINCT DOT.scheduler_id)            -- or task_address / exec_context_id
FROM sys.dm_os_tasks AS DOT
WHERE
    DOT.task_state = N'RUNNING'
    AND DOT.session_id = 72                     -- specify the session id
    AND DOT.parent_task_address IS NOT NULL;    -- additional workers only

I've also tried using the query_thread_profile Extended Event, but for a plan with multiple parallel branches, it shows the same N number of Thread IDs.

The extended event description for this debug channel event is imprecise:

xe desc

The documentation for sys.dm_exec_query_profiles gives a better description of the thread_id:

Distinguishes the threads (for a parallel query) belonging to the same query operator node.

Within each branch, each thread is given an ID, similar to the per-thread breakdown in showplan. At DOP x, there will be thread ids from 1 to x for each branch, but it is simply a sequential number (think row number).

You can tell the threads apart in the extended event by collecting task/worker/system thread id. If you are interested in thread ids for the whole executing plan, the best proxy for that is execution_context_id. Each execution context is associated with exactly one thread.

Visualization

Visualization of parallel execution can be done in many different ways, depending on which particular aspect is under investigation. The DMVs can be joined and ordered in many ways.

For example, the following shows what each thread did most recently, and what, if anything, it is currently waiting on:

DECLARE @session_id smallint = 72;

SELECT
    DOT.task_state,
    DOT.scheduler_id,
    DOT.exec_context_id,
    DEQP.physical_operator_name,
    DEQP.node_id,
    DEQP.thread_id,
    DEQP.row_count,
    DOWT.wait_duration_ms,
    DOWT.wait_type,
    DOWT.resource_description
FROM sys.dm_os_tasks AS DOT
OUTER APPLY
(
    -- What each thread did most recently
    SELECT * 
    FROM sys.dm_exec_query_profiles AS DEQP
    WHERE
        DEQP.session_id = DOT.session_id
        AND DEQP.request_id = DOT.request_id
        AND DEQP.task_address = DOT.task_address
    ORDER BY
        DEQP.last_active_time DESC
    OFFSET 0 ROWS FETCH FIRST 1 ROW ONLY
) AS DEQP
OUTER APPLY
(
    -- Longest wait, if any
    SELECT * 
    FROM sys.dm_os_waiting_tasks AS DOWT
    WHERE
        DOWT.session_id = DOT.session_id
        AND DOWT.exec_context_id = DOT.exec_context_id
    ORDER BY
        DOWT.wait_duration_ms DESC
    OFFSET 0 ROWS FETCH FIRST 1 ROW ONLY
) AS DOWT
WHERE
    DOT.session_id = @session_id
    --AND DOT.task_state = N'RUNNING'
ORDER BY
    DEQP.node_id,
    DOT.exec_context_id;

Example output for the demo query:

output