Mysql – optimizing thesql query using explain

MySQLoptimization

I'm trying to optimize a query for a client; I'll try to describe what I'm going for at a high level before diving into the structure. The client database has a table of transactions (currently ~352k records). They have a blacklist in place where they can enter email addresses, customer account numbers, credit card BIN numbers, etc., which will prevent anyone from completing a transaction if they use that information. Recently they put in a feature request for a "cross-reference" report: for each blacklisted email address:

  1. find any transaction where the buyer's or the recipient's primary or secondary email address shows up in the blacklist (as in, transactions that went through using email addresses that were later blacklisted),
  2. collect any customer account number and ip addresses from those transactions, and
  3. display any additional transactions using those collected account numbers and ip addresses.
  4. Do the same for blacklisted account numbers (collect emails and IPs) and blacklisted IP addresses (collect emails and account numbers).

My approach to making this work was to create a stored that was simply a bunch of INSERT INTO statements; for example, the first one looked a little like this (anonymized, of course):

insert into `blacklist_xref_filters`
select distinct 
    t.email as `filter_criteria`, 
    "accountNum" as `blacklist_source`, 
    blacklistAccountNum.account_number as `blacklist_entry`, 
    unix_timestamp(blacklistAccountNum.created_at) as `blacklist_stamp` 
from TransactionRecords t
inner join blacklistAccountNum on 
    t.purchaserAccountNumber = blacklistAccountNum.account_number 
    or t.recipientAccountNumber = blacklistAccountNum.account_number
where (not t.email = '') and (blacklistAccountNum.deleted_at is null);

This query was repeated nearly verbatim three more times except using t.emailSecondary, t.recipientEmail, t.recipientEmailSecondary, and t.ip_address in place of t.email. The next query was nearly the same thing, except this time pulling in transactions associated with blacklisted email addresses:

insert into `blacklist_xref_filters`
select distinct 
    t.purchaserAccountNumber as `filter_criteria`, 
    "email" as `blacklist_source`, 
    blacklistEmail.email as `blacklist_entry`, 
    unix_timestamp(blacklistEmail.created_at) as `blacklist_stamp` 
from TransactionRecords t
inner join blacklistEmail on 
    t.email = blacklistEmail.email 
    or t.emailSecondary = blacklistEmail.email
    or t.recipientEmail = blacklistEmail.email
    or t.recipientEmailSecondary = blacklistEmail.email
where (not t.purchaserAccountNumber = '') and (blacklistEmail.deleted_at is null);

The last thing that happens is, I join this to the TransactionRecords table to return a list of records that are associated with blacklisted items:

select distinct <fieldlist> from TransactionRecords
inner join `blacklist_xref_filters` on 
    `TransactionRecords`.`purchaserAccountNumber` = `blacklist_xref_filters`.`filter_criteria` or 
    `TransactionRecords`.`recipientAccountNumber` = `blacklist_xref_filters`.`filter_criteria` or 
    `TransactionRecords`.`email` = `blacklist_xref_filters`.`filter_criteria` or 
    `TransactionRecords`.`emailSecondary` = `blacklist_xref_filters`.`filter_criteria` or 
    `TransactionRecords`.`recipientEmail` = `blacklist_xref_filters`.`filter_criteria` or 
    `TransactionRecords`.`recipientEmailSecondary` = `blacklist_xref_filters`.`filter_criteria` or 
    `TransactionRecords`.`ipAddress` = `blacklist_xref_filters`.`filter_criteria`;

The problem I'm having is that some of these individual SELECT queries can take 10 to 15 seconds each to run; and since I'm running 14 separate INSERT queries, the total time to render the view consistently takes at least 45 seconds, sometimes longer.

I ran that first source query through EXPLAIN and I'm not sure what, if anything, I can do to further optimize it:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: blacklistAccountNum
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 147
     filtered: 100.00
        Extra: Using where; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: transactionrecords_purchaseraccountnumber_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 350910
     filtered: 100.00
        Extra: Using where; Using join buffer

The output from show warnings indicates the first query was rewritten this way:

select distinct 
    `t`.`email` AS `filter_criteria`,
    'accountNum' AS `blacklist_source`,
    `blacklistAccountNum`.`account_number` AS `blacklist_entry`,
    unix_timestamp(`blacklistAccountNum`.`created_at`) AS `blacklist_stamp` 
from `TransactionRecords` `t` 
join `blacklistAccountNum` 
where 
    (
        (`t`.`email` <> '') and 
        isnull(`blacklistAccountNum`.`deleted_at`) and 
        (
            (`t`.`purchaserAccountNumber` = `blacklistAccountNum`.`account_number`) or 
            (`t`.`recipientAccountNumber` = `blacklistAccountNum`.`account_number`)
        )
    )

As far as indexes go, I have separate indexes on:

  • blacklistAccountNum.deleted_at
  • TransactionRecords.purchaserAccountNumber
  • TransactionRecords.recipientAccountNumber

Doesn't seem to be using that index on purchaserAccountNumber though. I've never used EXPLAIN before so I'm a little lost…what steps should I take next to keep this particular query from taking 12 seconds to run?

Best Answer

One issue with your query is OR inside ON - that means it has to check entire second table to find matching rows, because usually no index can be efectively used for those (there are some possibilities for index merge, but those generally work on "base" table, not joined one).

You can split those inserts by different conditions, you will have more of them, but each can use different index. Then you have to give it those indexes - purchaserAccountNumber, recipientAccountNumber, each email field..

And you should check if column type for purchaser/recipientAccountNumber matches blacklistAccountNum.account_number because if not, then mysql has to do some type juggling in between and again cannot use index effectively. Same again for emails that you join.