MySQL Insert/Update across 3 tables with 1m+ rows

insertjoin;MySQL

To start with, I know nothing of database design, so I apologise if this seems obvious to others. I have been researching up to 3NF over the last few weeks, and I think I have a layout that works.

I have a database with 1m+ rows, currently organised as follows:

Table: MasterTable
Rows: ID, FirstName, LastName, PetName, PetAge

I would like to split it as follows:

Table: People
PersonID (PK), FirstName, LastName

Table: Pets
PetID (PK), PetName, PetAge

Table: Records
RecordID (PK), MasterTable.ID, People.PersonID, Pets.PetID
PKs in all cases auto-increment so that more records can be added later.

The people and pets tables have been populated using:

INSERT INTO Pets(PetName, PetAge)
SELECT PetName, PetAge
From MasterTable
WHERE 1

INSERT INTO People(FirstName, LastName)
SELECT PetName, PetAge
From MasterTable
WHERE 1

INSERT INTO Records(ID)
SELECT ID
From MasterTable
WHERE 1

So I have three tables. When I try to create the Records table, I can't get anything to work.
I have tried:

INSERT INTO Records(PersonID, PetID, ID)
SELECT People.PersonID, Pets.PetID, MasterTable.ID
FROM MasterTable
LEFT JOIN People ON MasterTable.FirstName = People.FirstName AND People.LastName = MasterTable.LastName
LEFT JOIN Pets ON Pets.PetName = MasterTable.PetName AND Pets.PetAge = MasterTable.PetAge
WHERE 1

I think the WHERE clause might be the problem. I have tried
WHERE Pets.PetName = MasterTable.PetName
and almost every kind of WHERE I can think of.

I have a few questions I'd really appreciate some help with as I'm going out of my mind here.

  1. Does it matter the order of the LEFT JOIN clauses? Does it matter which table is specified first and which is specified last?

  2. I initially tried INNER JOIN but I figure it's just going to join more columns than is necessary, is that right?

  3. If I am inserting firstname and lastname, I can't match on firstname and lastname, right? As in, create the firstname lastname entries and then use that ID to match the next join?

It seems simple enough to split this into three, assign a PK to each, and then create a finale table where PKs relate to PKs, but apparently it's not.

When I add '''LIMIT 5''' the select returns the correct info. Without the limit clause, all my attempts have run for over 24h and not finished. Either they have been stuck copying everything to temp tables, or they have just said "selecting data" as the status.

Can someone please help?

Sorry if something doesn't make sense, I'll clarify as I go.

Best Answer

LEFt JOIN increases the Number of Records, you will have only have 2 two people with the same name and the same pets name and age, but hopefully you get not much of them.

But still you want only a INNER JOIN, vecause you only want records, that are actual in the MasterTable and have a pet.

so you should switch to inner joins

INSERT INTO Records(PersonID, PetID, ID)
SELECT People.PersonID, Pets.PetID, MasterTable.ID
FROM MasterTable
INNER JOIN People ON MasterTable.FirstName = People.FirstName AND People.LastName = MasterTable.LastName
INNER JOIN Pets ON Pets.PetName = MasterTable.PetName AND Pets.PetAge = MasterTable.PetAge
WHERE MasterTable.ID BETWEEN  0 AND 50000

But you limit the number of rows inserted, by only choosing a number records, try as i shown in theexampe with 50000 at first, and see if that is fast enough

And the take the next.

In case you want all people, regardless if the have a pet.

INSERT INTO Records(PersonID, PetID, ID)
SELECT p.PersonID, pe.PetID, m.ID
FROM People p
LEFT JOIN MasterTable m ON m.FirstName = p.FirstName AND p.LastName = m.LastName
INNER JOIN Pets pe ON pe.PetName = m.PetName AND pe.PetAge = m.PetAge
WHERE p.PersonID BETWEEN  0 AND 50000

Here you see that all pöople are choosen to be in the records, even when they have no pets, that is what LEFT JOIN mean.

But here you must limit the number of Entries to be inserted with person.PersonID

With a million records, you need a lot of buffer and tmp space to pluu that of, but you can serialize by dividing the number of MasterTable.

FYI: If something like this occurs, look into the error logs of MySQL. It will show you what you have to do and what MySQL has encountered for a problem