Postgresql – Optimizing PostgreSQL where clause with OR

performancepostgresqlpostgresql-performanceunionwhere

I have a query that uses an or in the where clause. It is slow. It seems that re-writing the query using a union could get the same results much quicker. But I'm in an environment where we generate SQL for different technologies. The or clause works well in Oracle, for example. I'd like to consider other options besides union that might be less radical if there are any good alternatives.

Here's the minimal setup to make the requirement clear.

create table my_table as 
select 
    'EXISTING_RECORDS' as s
  , generate_series(10001,30000) as id
  , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as name
  , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as other_name
;
insert into my_table 
select 
    'NEW_RECORDS' as s
  , generate_series(1001,3000) as id
  , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as name
  , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as other_name
;

That create and insert gives us 20k EXISTING_RECORDS and 2k NEW_RECORDS. The name fields are 5 random characters, so we'll end up with a handful of matches when matching based on the following query.

select t1.s, t1.id, t1.name, t1.other_name, t2.name, t2.other_name, t2.s, t2.id
from
    my_table t1
  , my_table t2
where
  t1.s = 'NEW_RECORDS'
  and ( 
    t1.name = t2.other_name 
    or 
    t2.name = t1.other_name
  )
;

If I reduce the where clause to just t1.name = t2.other_name or the reverse, then it executes in about 7 ms in my environment. But with the or as shown it takes over 11500 ms.

I posted the explain plan in case that's helpful.

Is there any reasonable pattern I should consider in order to get a good execution plan besides converting this to two separate queries that I union together? For example: I'm happy to add indexes. But as far as I can tell, that doesn't provide any benefit.

Best Answer

Solution:

create index on my_table (name); 
create index on my_table (other_name);

Credit: @a_horse_with_no_name

Comment: I am befuddled that I didn't find this myself. I'm sure I tested [multiple] variations of adding indexes, and I don't know how I missed the obvious one. But I'm so happy that I'm giggling, and I'm grateful for the additional eyes on the problem to find this simple workaround.