Limit the degree of parallelism (DOP) available to any query

data-warehouseoracle

On Oracle Exadata (11gR2), we have a relatively beefy database.

  • cpu_count is 24
  • parallel_server_instances is 2
  • parallel_threads_per_cpu is 2

We noted, through observation in Oracle Enterprise Manager (OEM), that performance was terrible due to queries being executed serially. To resolve this, all tables, materialized views and indexes were altered to take advantage of parallelism. eg:

ALTER TABLE SOME_TABLE PARALLEL (DEGREE DEFAULT INSTANCES DEFAULT);

The system was altered to turn on parallelisation:

ALTER SYSTEM SET PARALLEL_DEGREE_POLICY = 'AUTO';

This resulted in better performance but we occasionally observed in OEM that a single query would tie up a DOP of 96 (all of the available resource). This resulted in subsequent queries being downgraded to a DOP of 1 (no parallelisation). Resulting in poor performance until the hogging query completed.

To resolve this we tried to limit the DOP available to any query with:

ALTER SYSTEM SET PARALLEL_DEGREE_LIMIT = 24;

This had no effect. We frequently observe queries that will use more than the limit (generally 48 or 96, but no real pattern).

How do we prevent any single query from hogging all of the available resource?

Best Answer

Parallel Server Sets: PARALLEL_DEGREE_LIMIT limits the degree of parallelism, but if your query is sorting or grouping the number of parallel processes can be twice as much (two server sets to enable inter-process parallelism). That explains why you will see 48 parallel processes even with a limit of 24. This also happens if you use resource manager to limit the DOP.

Parallel hints: PARALLEL_DEGREE_LIMIT only applies to statements that use the auto degree of parallelism. Any statements that use a hard-coded degree, or even any type of object-level parallel hint, will ignore the limit. If you have those hints, that could explain why you see 96 some times.

Calibrate IO: Maybe automatic DOP is not being used, and thus the limit is not being followed, because the IO was not calibrated. This query will tell you if the IO was calibrated:

select * from V$IO_CALIBRATION_STATUS;

I've seen this cause problems before, but my current system is not calibrated and automatic DOP seems to work fine. You can tell if this is really an issue by looking at the Notes section of the explain plan. If you see something like - automatic DOP: Computed Degree of Parallelism is 2 you're fine, but you don't want to see automatic DOP: skipped because of IO calibrate statistics are missing.

Increase PARALLEL_MAX_SERVERS: Instead of worrying about running out of parallel servers, I would recommend that you significantly increase PARALLEL_MAX_SERVERS. You should at least try to go back to the default value, PARALLEL_THREADS_PER_CPU x CPU_COUNT x concurrent_parallel_users x 5, between 240 and 960 depending on your memory settings.

Those high numbers sound ridiculous to many DBAs, but they actually make a lot of sense for the following reasons:

  • Oracle parallel servers are more light weight than most people assume. (And hardly anybody ever tests it, they just find one situation where a large DOP causes a problem and assume that high DOP is always bad.)
  • It's common to run an adhoc query in a GUI tool that only retrieves the first 50 rows, but still use dozens of parallel servers. These queries are NOT consuming any significant resources, unless PARALLEL_MAX_SERVERS is too low. Then people get yelled at for running perfectly reasonable queries, which can lead to some ugly situations.
  • A very large DOP for a single query is not always bad. Everyone assumes that if you keep increasing the DOP, the overhead will get too high and the performance will drop significantly. But on many systems I've found that even a ridiculously high DOP will lead to better performance, although there are definitely diminishing returns, and it can be very unfair to other sessions. But don't just guess, test it; take a query and run it with all kinds of DOPs, up to a 1000. You may be surprised.
  • Yes, too much parallelism can be bad. But what's worse for the system, having slightly more than the optimum number of sessions, or forcing a query to serial and basically killing an important job? You should monitor the system before introducing arbitrary limits.