Mysql – Alternative to slow [LEFT JOIN + OR] without UNION

join;mysql-5.6performancequery-performanceunion

I have a reasonably simple query that returns all purchases for a given user, searching by his government issued id. The document could be in two different tables.

The query using only left joins runs extremely slow, as in – 11 minutes.
And if instead of an "OR", I run two separate queries, joined by a UNION, they yield the same exact output in 4 seconds.
Although possible, it would be preferable to avoid using UNION, due to external factors¹.

Am I missing something obvious that's causing this huge disparity?
Is there a reasonable way to fix the LEFT-JOIN-only query so that it returns in less than, say, 10 seconds?

Standard query:

SELECT
  purchase.id
FROM
  purchase
  LEFT OUTER JOIN user on purchase.buyer = user.id
  LEFT OUTER JOIN user_documents on user.id = user_documents.user
  LEFT OUTER JOIN buyer_info on purchase.id = buyer_info.purchase
WHERE documents.value = '123' OR buyer_info.document = '123'

Example of UNION used:

SELECT
  purchase.id
FROM
  purchase
  LEFT OUTER JOIN user ON purchase.buyer = user.id
  LEFT OUTER JOIN user_documents ON user.id = user_documents.user
WHERE documents.value = '123'
UNION
SELECT
  purchase.id
FROM
  purchase
  LEFT OUTER JOIN buyer_info ON purchase.id = buyer_info.purchase
WHERE buyer_info.document = '123'

All fields used in all tables during the query are indexed fields.

PURCHASE and BUYER_INFO table: ~32 million records each.
USER table: ~16 million records.
USER_DOCUMENTS table: ~8 million records.

This is the simplified tables' descriptions:

PURCHASE
| Field | Type | Null | Key | Default |
| id | bigint(20) | NO | PRI | NULL |
| buyer | bigint(20) | YES | MUL | NULL |
USER
| Field | Type | Null | Key | Default |
| id | bigint(20) | NO | PRI | NULL |
USER_DOCUMENTS
| Field | Type | Null | Key | Default |
| id | bigint(20) | NO | PRI | NULL |
| USER | bigint(20) | NO | MUL | NULL |
BUYER_INFO
| Field | Type | Null | Key | Default |
| id | bigint(20) | NO | PRI | NULL |
| purchase | bigint(20) | NO | UNI | NULL |
| document | varchar(14) | YES | MUL | NULL |

I wasn't sure if this should be on StackOverflow or here, but I've read the Help Center and it mentioned query-performance is adequate here. Given that I've already gone through dozens of questions in StackOverflow while researching this and didn't found anything that helped, I thought this could be the right place.

I've already tried:

  • Pushing predicates into the OUTER JOIN clause, to no avail;

  • Removing two LEFT JOINs entirely and using WHERE EXISTS instead — this helped but not nearly enough;

  • Using inner joins where applicable — also helped but again not even close.


1 – External Factor: legacy critical production codebase that builds that SQL would need vastly greater changes in order to implement that as an UNION.


EDIT: addressing some of the comments
Both queries work perfectly as far as correct data retrieval goes, including the query using UNION.

Also, the UNION query is already fast enough for our purposes.

The only reason why I would prefer to avoid it is because it would require way more changes in a production codebase to get working. (The queries here are vastly sanitized and oversimplified.)
(That's why the very title says without UNION)

If that's the only sensible way to go, then we'll go for it, but I don't understand why adding an "OR" breaks the performance so much, and I thought maybe there was something I was overlooking.

If we brake the first query into two, they each take 2,5 seconds to run. But make it a single query using an OR operator and suddenly it takes 11 minutes.

I was hoping to get a better understanding of why is that, and if it was possible to fix it.

Best Answer

If you select from the results of a union all you will be able to filter the results you are looking for in a speedy fashion and only have to have a single where clause.

SELECT info.id
FROM
  (SELECT
    purchase.id
  FROM
    purchase
  LEFT OUTER JOIN user ON purchase.buyer = user.id
  LEFT OUTER JOIN user_documents ON user.id = user_documents.user
  UNION ALL
  SELECT
    purchase.id
  FROM
    purchase
  LEFT OUTER JOIN buyer_info ON purchase.id = buyer_info.purchase) info
WHERE info.document = '123'