Then it looks like an optimizer's blind spot and you should use the second query.
When there is a condition joining two tables a
and b
: a.id = b.id
and an additional condition a.id > @some_constant
, seems like the optimizer uses the "index condition" for where to start the index scan on a (id)
index but it doesn't use it for the second index b (id)
.
So, adding the (redundant) b.id > @some_constant
allows it to produce a slightly more efficient plan, skipping a part of the b (id)
index as well.
This could be posted as a suggestion for improvement (if it hasn't been already) to the Postgres hackers group.
After the edit, we know there is a FOREIGN KEY
constraint from t2
that REFERENCES t1
. So the "natural" (equivalent) way to write the query would be:
SELECT
-- whatever
FROM
t2
LEFT JOIN
t1 ON t1.id = t2.id
WHERE t2.id > -9223372036513411363 ;
Can you try this and tell us the execution plan it produces? There are some transformations that apply only to LEFT
(outer) joins and not to inner joins.
Unfortunately this doesn't produce any different plan either.
The OP has posted a question at the Postgres performance list, we can see the whole thread here: PostgreSQL seems to create inefficient plans in simple conditional joins and the reply by David Rowley, which confirms that this is a feature that although it has been considered, hasn't yet been implemented in the optimizer:
Yes, unfortunately you've done about the only thing that you can do,
and that's just include both conditions in the query. Is there some
special reason why you can't just write the t2.id > ...
condition in
the query too? or is the query generated dynamically by some software
that you have no control over?
I'd personally quite like to see improvements in this area, and even
wrote a patch 1 which fixes this problem too. The problem I had when
proposing the fix for this was that I was unable to report details
about how many people are hit by this planner limitation. The patch I
proposed caused a very small impact on planning time for many queries,
and was thought by many not to apply in enough cases for it to be
worth slowing down queries which cannot possibly benefit. Of course I
agree with this, I've no interest in slowing down planning on queries,
but at the same time understand the annoying poor optimisation in this
area.
Although please remember the patch I proposed was merely a first draft
proposal. Not for production use.
The auto_explain feature seems to fit this use case, if you can take a brief service outage to enable it. It logs a plan for any statement exceeding a time threshold whether or not it has completed yet. It can also log plans before execution starts, so you can even get the plan of a statement that blows out memory!
Best Answer
I know this question is old but I monitore all my updates in postgresql including updates inside transactions.. So I decided to help!
1- First create a sequence named myprogress for example.
2- Make your update to hit the sequence - now you are free to start a transaction:
For example if you have this update:
Just rewrite it to (plus this and condition):
**You are ok to run this update inside a transaction, as sequences are affected globally. **
Note that this will not significantly affect performance.
3- Monitore your progress. Now you can just connect in another session (or a transaction) and select your sequence (sequences are affected globally , so you will see the value in the another sessions):
And you will see how many lines are affected by your query until now, with that you can estimate how many lines by second are hit by your update.
And in most of the cases how many time you will need to wait...
4- At end just restart the sequence:
So you can use it again - But carefully, to do not trust this if two users use the same sequence. If you are in doubt better if you have your own progress sequence with permissions only for you.
You can use it for slow SELECTs, DELETEs, and for some INSERTs too !