Mysql – Efficient workflow/queries to save unique information

database-designMySQLoptimizationPHP

I am saving company information in a database. There is a single table with an auto-incrementing key that is unique to each company. There are separate tables for emails, websites, phone numbers, and company names. When a new company is added to the database, a check should be performed across the various tables to see if unique data already exists.

For instance, if a new phone number, email, and website are added, a check should to be performed to see if they are all already present in the database. If the phone number and email are present (a unique combination), then I want to grab the unique company id for those two entries and add a row to the website table indicating that the website is now associated with that company id.

However, if the phone number and email exist for one company id while the email and website exist for a different company id, I want to merge the two ids into one.

I personally hate this setup, but I can't think of a cleaner way to save all this data. For instance, it's not guaranteed a phone number, email, or website will always be passed. Moreover, some of the data is fairly subjective. I can easily query email addresses and phone numbers from multiple tables:

SELECT id FROM companies LEFT JOIN emails using(id) LEFT JOIN phones using(id) WHERE phone = "123" AND email = "john@smith.com"

However comparing organization names, a WHERE query won't return correct results. I wrote a function in PHP to compare businesses names using similar_text and soundex, but those can't easily be incorporated into a MySQL query. Similarly, one check I have in place to differentiate businesses is a longitude/latitude distance check. Consequently, my envisioned workflow (as of now) is:

  1. Loop through each unsaved company profile in PHP
  2. Get all saved, unique information from the database:

    SELECT id, email, position, address, zip, organization_name, phone, website FROM companies left join emails using(id) left join locations using(id) left join organization_names using(id) left join phones using(id) left join websites using(id)

  3. In PHP, loop through each row, attempting to find a match between the saved values and unsaved value.

  4. If a unique entry exists, update the database. If a unique entry does not exist, add the appropriate rows to the database.

I can see this being incredibly inefficient given the number of companies saved in the database, which is why I am here asking for help. Database design is not my forte, so any advice is appreciate.

Best Answer

My solution of this would be use relation tables instead of merging ids, use uniqueness for tables phone number, email, and website etc. and insert with IGNORE command like

Insert ignore into emails values (5,a.a@a.com);

If you use the IGNORE keyword, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors may generate warnings instead, although duplicate-key errors do not.

or if you want o update on duplicates, use Replace instead of Insert ignore

REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.

This way you don't need to check for duplicates. If you want more complicated duplicate checks, you need o do it in the code, not in the database