Postgresql – What does below parallel processing execution plan mean

execution-planpostgresql

I have a table tbl_a with 200 k records. I wrote a query to bring sum of id (int) column. Now execution plans shows 2 workers launched and below that parallel sequential scan on tbl_a is shown 3 times; with processing 66667 records each time(200k/3=66667). Now my question is since there were two workers so should there not be only two times table seq scan and each time with record 100k (200k/3). I am attaching the image which lists the table detail, query and query execution plan for better understanding.

Can someone tell me why with 2 workers there are 3 loops , each processing 6667 records out of total 200k records?

Best Answer

The two worker processes are additional to the session backend process, which is the “parallel leader”, so there are three processes working on the scan.

Related Solutions

Sql-server – Parallel Plan Selection

From the SQL Server query optimizer's point of view, there is not much to choose between the parallel and serial execution plans in this case.

In general, the optimizer's cost model reduces the CPU cost (not the I/O cost) of operators in a parallel plan in proportion to the estimated degree of parallelism available. This CPU adjustment explains why the optimizer ever chooses a parallel plan (which will generally consume more resources) over a serial plan.

Unfortunately, the cost model does not apply this CPU reduction to the inner side of a nested loops join. It makes no sense to me (because the inner side still uses parallelism efficiently), but I didn't design the cost model.

Anyway, because the majority of the CPU cost in this execution plan is associated with the Clustered Index Scan (for which CPU reduction does not apply), the choice between serial and parallel is a close one. In broad terms, to be selected, a parallel plan must save enough using the local/global Stream Aggregate to compensate for the extra exchanges (Distribute and Gather Streams). The costs involved in that decision depend sensitively on the distribution of row values, as well as the number of rows. With relatively few rows and low-CPU operators, the trade-off can easily go either way.

In short, this query suffers from a debatable design choice applied to the costing of parallel nested loops joins. You can force the selection of a parallel plan using a plan guide, or by using the undocumented trace flag 8649. In SQL Server 2016 SP1 CU2 onward, you can also use the undocumented ENABLE_PARALLEL_PLAN_PREFERENCE hint.

Sql-server – Understanding below execution plan

1. How to understand estimated operator cost? Tb1 which don't have index is scanned and cost is 2 %, whereas index is being used on tb2 and cost is 98%.

Plan

The heap table is only fully scanned once, but the index seek is executed 1,000,000 times. The optimizer estimates that a million seeks in this case will represent 98.4% of the total cost of executing the query, whereas a single parallel scan of the heap table will represent 0.9% of the cost.

These are just estimates used for internal plan choice reasons; they do not generally reflect real-world performance on modern hardware, and are never anything more than an estimate - even in a post-execution ("actual") execution plan.

In Management Studio:

Seek tooltip

In SQL Sentry Plan Explorer:

Plan Explorer tooltip

2. From the above snip of table scan (whose cost is 2%),number of executions are 24,will that means sql read rows in batches and stored in memory and for each row it did a seek operation from tbl2.

No, it means 24 parallel threads co-operated to perform a single scan of the heap table. Each thread still reads a row at a time from the scan, performs a seek into the indexed table, then gets the next row from the scan, and so on until the task is complete.

Rows are not read in batches and stored in memory in this plan. SQL Server reports 24 scans because 24 threads each performed a partial scan of the table, resulting in one full scan overall.

3. Also any pointers to know more about force scan,force index when I pressed F4 after clicking an operator

The ForceScan, ForceSeek, and ForcedIndex properties are set to true if the query specifies a FORCESCAN, FORCESEEK, or INDEX hint - or if the query optimizer decides that a particular access strategy is required for correctness (for example, when checking foreign key constraints).

Hint properties

Best Answer

Related Solutions

Sql-server – Parallel Plan Selection

Sql-server – Understanding below execution plan

Related Question