Postgresql – Performance: where before or after join

join;performancepostgresqlpostgresql-performance

I am doing a query like this:

select 
   * 
from 
   table1 join table2 on table1.fieldjoin = table2.fieldjoin
where 
   table1.field = 'y'

I would ask is this query equivalent in terms of performance to this:

select 
    * 
from 
  (select * from table1 where field = 'y') as t1 join table2 
  on table1.fieldjoin = table2.fieldjoin

That is by explicting doing a where inside the joined table

Best Answer

A SQL query is a definition of what data to return. There is a component of the database called the optimizer which determines how to get that data - which indexes to use, if any, in which sequence, which join algorithm to use etc. The output of that component is a query plan, the equivalent of a procedural program in the DBMS's internal execution language. The optimizer is free to choose any plan it likes as long as it is guaranteed to produce the output required by the SQL.

Sometimes queries which are logically equivalent but use different syntax will produce quite different query plans. Sometimes the optimizer recognizes the equivalence and produces the same plan for both statements. It can be difficult to predict which without good understanding of the working of the optimizer.

However, the optimizer does make available to us these plans. In PostgreSQL they can be retrieved using EXPLAIN. If the plans for the two queries are the same there will be no performance difference between them. Different plans will have different performance, though it may be slight.

There are a great many factors which go into constructing a plan other than the SQL. So a change in table cardinality (e.g. as time passes, or between production and development environments), or a different version of Postgres may produce a different plans for the same SQL statement.

Related Solutions

SQL Server Join Processing – Understanding Join/Where Processing Order

The logical processing of a query is on MSDN (written by Microsoft SQL Server team, not 3rd parties)

1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP

A derived table follows this, then the outer query does it again etc etc

This is logical though: not actual. No matter how SQL Server actually does it, these semantics are honoured to the letter. The "actual" is determined by the Query Optimiser (QO) and you avoid the intermediate Cartesion product you mentioned.

It's worth mentioning that SQL is declarative: you say "what" not "how" like you would for a procedural/imperative programming (Java, .net). So saying "this happens before that" is wrong in many cases (eg assumption of short circuits or L-to-R WHERE order)

In your case above, the QO will generate the same plan no matter how it is structured because it is a simple query.

However, the QO is cost based and for a complex query it may take 2 weeks to generate the ideal plan. So it does "good enough" which actually isn't.

So your first case may help the optimiser find a better plan because the logical processing order is different for the 2 queries. But it may not.

I have used this trick on SQL Server 2000 to get 60x speed performance improvement on reporting queries. As the QO improves version to version it gets better at working these things out.

And the book you mentioned: there is some dispute over it
See SO and the subsequent links: https://stackoverflow.com/q/3270338/27535

Postgresql – Retrieving data from inner join using LIMIT and OFFSET

Use a subquery (as displayed) or CTE for that purpose:

SELECT *
FROM  (
   SELECT qid, gid
   FROM   table1
   ORDER  BY date DESC
   LIMIT  10
   OFFSET ?
   ) q
JOIN   table2 a USING (qid, gid)

USING (qid, gid) is just a shortcut for ON q.qid = a.qid AND q.gid = a.gid with the side effect that the two columns are only included once in the result.

Best Answer

Related Solutions

SQL Server Join Processing – Understanding Join/Where Processing Order

Postgresql – Retrieving data from inner join using LIMIT and OFFSET

Related Question