MySQL – Merge Tables Without Data Repetition

MySQLstored-procedures

My situation is this: I have several DIFFERENT tables in my MySQL database with the same name columns. Such tables can contain (or not) the same data.

What I need to do is a kind of "merge" between these tables, creating a new table that will contain only the data repeated once.

In other words, I want to check for duplicate records in multiple tables, leaving one with the records without repetition.

A example to clarify:
I have 2 tables (in fact I have 10 tables, but 2 works for the example) with the columns id, name and city like below

+----+---------------------------------------------------------+-----------------------+
| id | name                                                    | city                  |
+----+---------------------------------------------------------+-----------------------+
| 6  | HOTEL PANAMBY GUARULHOS                                 | guarulhos             |
| 31 | PLAZA INN MASTER                                        | new york              |
| 40 | PLAZA INN AMERICAN LOFT SãO PAULO                       | sao paulo             |
| 41 | PLAZA INN PARIS                                         | paris                 |
+----+---------------------------------------------------------+-----------------------+

The id '41' of table 1 could be equal to id '80' of table 2, for example and so subsequently. My task is compare the tables and create a new one without repetition. Don't exist any FK between tables, they are isolated.

Until now, I discovered two approaches to deal with this.

1) Using SELECT UNION. Apparently is the best approach, but I couldn't see the correct joins and group by to use it, once I don't have FKs;

2) Using a Stored Procedure with cursors for two tables and compare row by row. I tried use this one, but SP it isn't my speciality, so i created a good one, but the tables has over 1000 rows each one and my SP always crashed the memory.

I found this question here, but is not my case, I want a new table without repetition. Other questions in this site with similar problem talk about SQL Server and don't work for me.

Currently, I'm doing the workaround with a PHP script, but I'd like to know for any MyQSL solutions, if any.
I found some articles about Dataclean, but I don't know if this is the case and never work with this in MySQL.

I hope I was clear.

Best Answer

It is not clear to me, if you want to keep records, which are duplicated, or you want only those records, which are in only one of the tables.

You can try:

-- create tables
create table tbl1
(
    id INT,
    name VARCHAR(50),
    city VARCHAR(25)
);

create table tbl2 like tbl1;

-- insert data
insert into tbl1 values (1,'adam', 'eden'), (2, 'eva', 'eden');
insert into tbl2 values (3,'adam', 'eden'), (4, 'pedro', 'alexandria');

This will give you records with unique name column:

select * from (select * from tbl1 union select * from tbl2) as a group by name;
+------+-------+------------+
| id   | name  | city       |
+------+-------+------------+
|    1 | adam  | eden       |
|    2 | eva   | eden       |
|    4 | pedro | alexandria |
+------+-------+------------+
3 rows in set (0.00 sec)

Note: this will keep id from tbl1, if you want to keep id from tbl2, just switch order of tables in sql statement

And this will give you only those records, which are in one, or other table, but not in both

select * from (select * from tbl1 union select * from tbl2) as a group by name having count(*) = 1;
+------+-------+------------+
| id   | name  | city       |
+------+-------+------------+
|    2 | eva   | eden       |
|    4 | pedro | alexandria |
+------+-------+------------+
2 rows in set (0.00 sec)

Is that what you wanted?