I need some help with a complicated query. Here are the dataset structures I'm dealing with. This is a MySQL database.
*transactions_table*
zip | date
-----------------
1234 | 2010-05-01
4567 | 2010-03-10
8901 | 2010-02-25
*stores_table*
brand_id | zip
---------------
1 | 1234
2 | 4567
3 | 8901
4 | 2222
5 | 6666
*manager_table*
brand_id | name
---------------
1 | jane smith
1 | joe smith
2 | ellen katz
3 | foo fuzz
4 | bar bazz
My goal is to get all the transactions in stores that are managed by managers whose names do not match for a regex on the final word of the name
column. The number of managers per store can be 1 or 2; it's inconsistent.
I can join all the data together, but then I can't figure out how to perform that last filter step. Here's what I have so far:
SELECT
manager.name, store.zip, txn.date
FROM
transactions_table txn
JOIN
stores_table store
ON
txn.zip = store.zip
JOIN
manager_table manager
ON
store.brand_id = manager.brand_id
Which gives me a derived table that looks like:
name | zip | date
------------------------------
jane smith | 1234 | 2010-05-01
joe smith | 1234 | 2010-05-01
ellen katz | 4567 | 2010-03-10
foo fuzz | 8901 | 2010-02-25
But I want to exclude the first two rows, because they represent a transaction at a store managed by people with names that match on their last name.
I want the dataset to look like:
name | zip | date
------------------------------
ellen katz | 4567 | 2010-03-10
foo fuzz | 8901 | 2010-02-25
A couple notes:
- I realize the domain this represents seems a little weird, but it's a simplified example for the sake of trying to nail down this query
- scale is an issue. the real
manager_table
is about 38MM rows and the realstores_table
is about 18MM, so I'd like to avoid a join on a derived table is possible
Best Answer
You could get the last word by first using
reverse()
to reverse it. So the last space becomes the first, so that the right one is taken even if there is more than one space in the name. Find the position of that space withlocate()
. With asubstring()
from the first character to the found position of the space minus one you'll get the last word reversed. Reverse this and have your last word.For comparison we it actually doesn't matter if the word is reversed or not and if the space is still included. So we can simplify it to:
But we have to take care that at least one space is included as otherwise we'd do a
substring(..., 1, 0)
(locate()
returns 0 if no match was found) which will result in the empty string. So two names without a space would match even if they were different, as the empty string is equal to the empty string. So we additionally need to check if:The order of the characters doesn't matter here, so we don't need to use
reverse()
.So you could use something like:
It will only join the set of managers, where no other manager with the same last word in
name
and the samebrand_id
exists. However note, that this fails to exclude managers without a space in their names, even if the names are equal. That is due to the fact, that you didn't reveal if there is any id column inmanager_table
. Onlyname
seems to be unique. So there is no way to tell if the inner row is the same as the outer row without checking for the whole name. If there is and id column (e.g. namedid
) you can rewrite it to:That will also find duplicate managers with the same name, without a space in it.
Note: I assumed you're not already using MySQL 8.0. Version 8.0 introduced
regexp_replace
andregexp_substring
providing possible easier alternatives to the string reversing.