Moving `order by` Significantly Improves Execution Time

oracleorder-by

I provide a simply query that, I think, resembles my real query.

I'm using Oracle.

Given:

create table main_table(a NUMBER, b VARCHAR2(10))
create table rates_table(a NUMBER, rate NUMBER)
create table other_table(a NUMBER, c VARCHAR2(10))

and their data:

select * from main_table

1   foo
2   bar

select * from other_table

2 bar

select * from rates_table

2 42.2

The following query finds all a's in main_table where its b matches other_table's c, limiting to the first 1000 results. Then, it left joins rates_table, getting the rate from other_table:

select o.a, o.b, rt.rate from (
  select mt.a, mt.b from main_table mt
  left join other_table ot on mt.a = ot.a
  where mt.b = ot.c
  and rownum < 1000
) o
left join rates_table rt on rt.a = o.a
order by o.a, o.b

Output:

2 bar 42.2

In my real query, which, again, resembles the above query, I noticed a significant (11 seconds to < 1 second) execution time when moving the order by to within the inner SELECT:

select o.a, o.b, rt.rate from (
  select mt.a, mt.b from main_table mt
  left join other_table ot on mt.a = ot.a
  where mt.b = ot.c
  and rownum < 1000
  order by mt.a, mt.b -- <---
) o
left join rates_table rt on rt.a = o.a

It, too, returns:

2 bar 42.2

Here's the first query's EXPLAIN PLAN:

And the second's:

In general, does the placement of order by matter, i.e. will the results differ depending on where I put this clause? I'm not asking about this particular query, since the data only has a few rows.

Why would the placement of order by, i.e. in my real query, result in such an improvement – 11 seconds to < 1 second?

Best Answer

Comparing apples to oranges. The 2 queries and the results they return are not the same.

First query retrieves 999 rows from main_table + other_table, joins to rates_table and sorts the result. The result of the 3 table join can be 999 rows or 10 million rows as well, depending on the data. If it is 10 million, it has to sort 10 million rows.

Second query retrieves 999 rows from main_table + other_table and sorts them right after. Finally joins the rates_table, and does not sort the final result.

The first query sorts everything, the seconds query sorts only the result of the subquery. Sorting 10 million rows will most likely need more resources+time than sorting 999 rows. The cost of sorting is often overlooked.

The first query returns a sorted resultset, but for the second query, there is no guarantee of returning a sorted result.

The second query may return inconsistent results between different database versions because of how the optimizer works in different versions. For example if column a is indexed, the optimizer may choose to access the table using the index, and returns the first 999 rows sorted anyway, because data is already sorted in the index. Or it may choose to scan the table, get whatever 999 rows it can, then sort that. This can be easily tested with a hints such as: /*+ optimizer_features_enable('12.1.0.2')*/ and providing different versions.

The first query is guaranteed to return the same result on all versions, as long as the data is organized the same way. As there is no criteria specified for getting the first 999 rows (first, based on what?), after reorganizing the table (move, shrink, export/import, redefinition), it may also return different results for the same base data. In this aspect, both queries are inaccurate, unless you do not care about consistent results - which is a quite rare situation.

Related Solutions

Oracle SQL for left outer join to rownum = 1 of another query

Basically, I think you should just get the max date, if that is all you are looking for here, using your same filter, etc. You already have STATUS (=COMP) and WONUM (JOIN). If you needed the whole record from this table, and it was more complicated than this, I would recommend the oracle inline analytic functions with over/partition by logic to filter by the max date.

SELECT *
FROM   WORKORDER
       LEFT OUTER JOIN (SELECT WONUM AS STATUSWONUM 
                               , STATUS AS STATUS
                               , MAX(CHANGEDATE) AS STATUSCHANGEDATE
                        FROM   WOSTATUSHISTORY
                        WHERE  STATUS = 'COMP'
                        GROUP  BY WONUM, STATUS)LASTCOMPLETE
         ON ( WORKORDER.WONUM = LASTCOMPLETE.STATUSWONUM ) 

;

Mysql – Optimizing ORDER BY for simple MySQL query

The EXPLAIN SELECT you posted definitely seems counter-intuitive.

If your query included WHERE s.id = ... then the query plan you're seeing might make a little bit more sense, but I'm assuming you're not.

It looks like the optimizer is getting distracted by the facts that supplier is a smaller table and that the supplier_id index in the po table can be used as a covering index... and with those facts in hand, it's overlooking the seemingly-obvious fact that the tables should be read in the opposite order than the one it chooses.

Here are two alternatives.

-- use the STRAIGHT_JOIN directive to insist that the optimizer process the tables in only the listed order:

SELECT STRAIGHT_JOIN * FROM `po` 
INNER JOIN po_suppliers s ON po.supplier_id = s.id
ORDER BY po.id ASC
LIMIT 10;

-- use the FORCE KEY index hint to direct the optimizer to prefer the primary key of the po table:

SELECT * FROM `po` FORCE KEY (PRIMARY) 
INNER JOIN po_suppliers s ON po.supplier_id = s.id
ORDER BY po.id ASC
LIMIT 10;

The first option is probably the better option, since FORCE KEY, in spite of the name, is still only a "hint" that the optimizer can choose to ignore, while STRAIGHT_JOIN genuinely does force the hand of the optimizer to join the tables in the order they're listed.

Best Answer

Related Solutions

Oracle SQL for left outer join to rownum = 1 of another query

Mysql – Optimizing ORDER BY for simple MySQL query

Related Question