Not enough time to test or write out all of the table creation but this is a start...
First create the table to hold the users. You can get data from the following query. One thing I don't know how you want to handle is if a user exists with a different phone number.
SELECT DISTINCT name
, phone
FROM old_table
After you have the user_accounts table join it back to the old_table to get the address along with the new user id.
SELECT DISTINCT a.address
, b.id
FROM old_table a
INNER JOIN user_accounts b on a.name = b.name and a.phone = b.phone
This is just a guess, as I do not have all info, but you probably would be better by doing:
EXPLAIN SELECT STRAIGHT_JOIN
*
FROM
tusers PARTITION (p362) tu
JOIN users PARTITION (p362) u
ON u.group_id=tu.group_id
AND tu.email_address=u.email
AND tu.group_id = 362
WHERE
tu.application_id=253555;
Note the STRAIGHT_JOIN
, that may not be needed -if it is needed, then I may have assumed wrongly- and the tu.group_id
comparison (that, again, shouldn't be needed).
Then using the following keys:
(tu.application_id, tu.group_id, tu.email_address)
(u.group_id, u.email)
However, if the number of records to be returned is 2.5M, as your cardinality suggests, then do not expect this to be fast... this is a pure IO math.
There are many other things that clicks me as problems, but I cannot say for sure without access.
Those could be even more effective if you didn't do a SELECT *
.
Another thing is that varchar(255) is usually a bad idea.
Best Answer
I'd probably suggest doing it in the code that will be creating the insert statement but it can be done in the insert statement itself too.
I included the LOWER so the style of the user name would be consistent in case a person entered their email address in MixedCase.