Oracle Performance – Difference Between Using Parallelism in Subquery or Outer Query

oracle-11g-r2plsqlquery-performance

I have the below query :

select /*+parallel(4)*/ ---> First Location
       <Desc_1> Amount,
       <Desc_2> Amount,
       <Desc_3> Amount
from (select /*+parallel(4)*/ ---> Second Location
             <Column_1>,
             <Column_2>
      from <Example_Table>
      where <Conditions>
      )
pivot(sum(Column_1) Amount
for <Column_2> in (1 as <Desc_1>,
                   2 as <Desc_2>,
                   3 as <Desc_3>)
                   );

The point is that <Example_Table> is a very big table and I want to ask where is the best to location to use the /*+parallel(4)*/ ? Is it better to use it in theSubQuery or in the outer select? What if we use /*+parallel(4)*/ in both outer and inner select? Is it bad for performance?

Best Answer

Well, if you have just one table, and your outer select is not accessing other tables, I would guess that "Second Location" would be the only one the optimizer looks at. In practice, easy to test: just do an explain plan on the two options, and see. Then invoke it both ways, and verify what happens.

Oracle is pretty smart in doing the right thing for parallel queries; take a look at the following parameters, if neither option is producing a parallel execution plan:

select name, value from v$parameter where name like 'parallel%' order by 1;

And look at Oracle Database Reference section on "Initialization Parameters" such as parallel_adaptive_multi_user for details.

So nothing terrible is likely to happen if you put it in both places, but definately test in a development environment first.

Related Solutions

MySQL – SUM with Multiple GROUP BY on Same Table

I'm more of a SQL Server guy, but here is a stab to get you motivated. You could use the CASE expression to help SUM() orders by certain order types.

For example:

SELECT
    idclient AS [Client ID]
   ,COUNT(value) AS [Total Orders]
   ,SUM(value) AS [Total Order Value]
   ,SUM( CASE WHEN type1 = 'some type' THEN value ELSE 0 END) AS [Order Combo 1]
   ,SUM( CASE WHEN type1 = 'some type' AND type2 = 'another type' THEN value ELSE 0 END) [Order Combo 2]
   ,SUM( CASE WHEN type1 = 'some type' AND type2 = 'another type' AND type3 = 'another type' THEN value ELSE 0 END) [Order Combo 3]
FROM 'test'.'orders'
GROUP BY idclient

This will group the orders on [idclient] and will SUM([value]) the instances where certain order records meet the following conditions or combinations you specified.

So, if a customer has an order where [type1] was Apples and [type2] was Oranges, you can SUM([value]) that was 1 under a new column called something like [Apple/Orange Orders].

The result can also include additional combinations for the same record in multiple fields/columns. If that is a problem, just move the CASE to the GROUP BY to break each combination into a separate record in the the result set.

Hope this helps!

Amazon-rds – Amazon Redshift – The difference between Query Slots, Concurrency and Queues

I think your understanding of query queues is a little off.

A queue is like a thread in Java. A query arrives and is designated to the "less loaded" queue, and it waits for its turn to be resolved.

The decision of where to put a query is independent of how busy a queue is; queries are allocated based on the rules you've set up: http://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-queue-assignment-rules.html

We can have as many queues as we like.

Not quite, check out http://docs.aws.amazon.com/redshift/latest/dg/cm-c-defining-query-queues.html. The relevant line is

The maximum total concurrency level for all user-defined queues, not including the reserved Superuser queue, is 50.

So the concurrency is definitely limited, but that's a reasonable restriction because each concurrency slot reserves some of the cluster's resources.

A queue has some memory allocated (we guess divided equally?)

By default memory is split equally among queues, but you can assign memory at the granularity of 1% of cluster memory. See the above link about defining query queues.

In a queue we can assign user groups or queries groups. But in short-term, that's a lot of classification work in our queries we can't do right now.

Creating user groups and setting query groups is actually pretty simple, see http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_GROUP.html and http://docs.aws.amazon.com/redshift/latest/dg/r_query_group.html.

If you didn't set up user groups or query groups, then my guess is that you didn't see any improvement after adding the extra queues because all the queries still ran in a single query queue. (In fact, the extra queues would simply take resources from the queue running your queries.) Since all your queries are running from the same user and for the same purpose, it makes sense that they should all resolve to the same query queue. A better solution might be to have only a single queue, but increase the concurrency level. (Keep in mind that a query queue's resources are divided equally among all concurrency slots, even if they're not being used, so a concurrency level of 50 means no query gets more than 2% of the total resources.)

All that being said, redshift is a higher-latency system than many other database solutions. If you're only looking to run lots of small queries, then it might not be the best fit for your problem.

Best Answer

Related Solutions

MySQL – SUM with Multiple GROUP BY on Same Table

Amazon-rds – Amazon Redshift – The difference between Query Slots, Concurrency and Queues

Related Question