I am saving company information in a database. There is a single table with an auto-incrementing key that is unique to each company. There are separate tables for emails, websites, phone numbers, and company names. When a new company is added to the database, a check should be performed across the various tables to see if unique data already exists.
For instance, if a new phone number, email, and website are added, a check should to be performed to see if they are all already present in the database. If the phone number and email are present (a unique combination), then I want to grab the unique company id for those two entries and add a row to the website table indicating that the website is now associated with that company id.
However, if the phone number and email exist for one company id while the email and website exist for a different company id, I want to merge the two ids into one.
I personally hate this setup, but I can't think of a cleaner way to save all this data. For instance, it's not guaranteed a phone number, email, or website will always be passed. Moreover, some of the data is fairly subjective. I can easily query email addresses and phone numbers from multiple tables:
SELECT id FROM companies LEFT JOIN emails using(id) LEFT JOIN phones using(id) WHERE phone = "123" AND email = "john@smith.com"
However comparing organization names, a WHERE
query won't return correct results. I wrote a function in PHP to compare businesses names using similar_text
and soundex
, but those can't easily be incorporated into a MySQL query. Similarly, one check I have in place to differentiate businesses is a longitude/latitude distance check. Consequently, my envisioned workflow (as of now) is:
- Loop through each unsaved company profile in PHP
-
Get all saved, unique information from the database:
SELECT id, email, position, address, zip, organization_name, phone, website FROM companies left join emails using(id) left join locations using(id) left join organization_names using(id) left join phones using(id) left join websites using(id)
-
In PHP, loop through each row, attempting to find a match between the saved values and unsaved value.
- If a unique entry exists, update the database. If a unique entry does not exist, add the appropriate rows to the database.
I can see this being incredibly inefficient given the number of companies saved in the database, which is why I am here asking for help. Database design is not my forte, so any advice is appreciate.
Best Answer
My solution of this would be use relation tables instead of merging ids, use uniqueness for tables phone number, email, and website etc. and insert with
IGNORE
command likeor if you want o update on duplicates, use
Replace
instead ofInsert ignore
This way you don't need to check for duplicates. If you want more complicated duplicate checks, you need o do it in the code, not in the database