MySQL Deduplicate Data Using HAVING with Subquery

duplicationfunctionsMySQLsubquery

I've inherited a DB that was lacking a unique constraint so data was/is duplicated. I'm now trying to remove the duplicate records, then add a constraint to block this going forward.

I have this query:

SELECT count(*) as dacount, substr(group_concat(id), (locate(',', group_concat(id))+ 1))
FROM `game`
group by matchid, ordinal
having dacount > 1
order by dacount desc

which correctly gives me the ids of the rows I need to delete. The problem with this though is that I can't use this as a subquery on the DELETE because of the dacount with the having parameter. Is there another way to do this?

Here was my plan:

DELETE FROM game WHERE id IN (SELECT count(*) as dacount, substr(group_concat(id), (locate(',', group_concat(id))+ 1))
FROM `game`
GROUP BY matchid, ordinal
HAVING dacount > 1)

Best Answer

There are restrictions regarding self-joins in MySQL's implementation of UPDATE and DELETE. The issue is usually solvable by using joins:

DELETE g.*
FROM game AS g
  JOIN 
   ( SELECT matchid, ordinal, min(id) AS id 
     FROM game
     GROUP BY matchid, ordinal
     -- HAVING count(*) > 1
   ) AS d
   ON  g.matchid = d.matchid
   AND g.ordinal = d.ordinal
   AND g.id > d.id ;

Test at dbfiddle.uk.

Another way would be using a LEFT JOIN / IS NULL construction:

DELETE g.*
FROM game AS g
  LEFT JOIN 
   ( SELECT min(id) AS id 
     FROM game
     GROUP BY matchid, ordinal
   ) AS d
   ON  g.id = d.id 
WHERE d.id IS NULL ;

or an EXISTS subquery rewritten as a JOIN
(delete all games that there exists another game with same matchid, ordinal and smaller id):

DELETE g.*
FROM game AS g
  JOIN 
     game AS d
   ON  g.matchid = d.matchid
   AND g.ordinal = d.ordinal
   AND g.id > d.id
 ;

SUGGESTION

Perhaps you should start with a GROUP BY ... HAVING query like this

SELECT table_name,COUNT(1) table_count
FROM information_schema.tables
WHERE table_schema NOT IN
('information_schema','performance_schema','mysql')
GROUP BY table_name HAVING COUNT(1) > 1;

This will definitely give all tables whose name appears in multiple databases

Form that as a subquery and join it to gather all databases the table appears in

SELECT
    A.table_name,
    GROUP_CONCAT(B.table_schema) TheTableAppearsInTheseDatabases
FROM
(
    SELECT table_name,COUNT(1) table_count
    FROM information_schema.tables
    WHERE table_schema NOT IN
    ('information_schema','performance_schema','mysql')
    GROUP BY table_name HAVING COUNT(1) > 1
) A INNER JOIN information_schema.tables B
USING (table_name) GROUP BY A.table_name;

Please notice that I use INNER JOIN rather than LEFT JOIN because it will really form a Cartesian product (2704 X 2704) and then perform comparisons.

I know this works because I tried it out in MySQL 5.5.12 on my Windows7 machine

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 33
Server version: 5.5.12-log MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>     SELECT
    ->         A.table_name,
    ->         GROUP_CONCAT(B.table_schema) TheTableAppearsInTheseDatabases
    ->     FROM
    ->     (
    ->         SELECT table_name,COUNT(1) table_count
    ->         FROM information_schema.tables
    ->         WHERE table_schema NOT IN
    ->         ('information_schema','performance_schema','mysql')
    ->         GROUP BY table_name HAVING COUNT(1) > 1
    ->     ) A INNER JOIN information_schema.tables B
    ->     USING (table_name) GROUP BY A.table_name;
+--------------------------+-------------------------------------------------------------------------------------------------------------------+
| table_name               | TheTableAppearsInTheseDatabases                                                                                   |
+--------------------------+-------------------------------------------------------------------------------------------------------------------+
| a                        | junk,test,robottinosino                                                                                           |
| acl                      | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| blocks                   | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| blog                     | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| blog_category            | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| blog_entrycat            | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| blog_meta                | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_archive,weisci_jaws_staging2                                     |
| blog_trackback           | weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                     |
| calendar_events          | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| calendar_meta            | weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                     |
| calendar_questions       | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| calendar_tickets         | weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                     |
| calendar_transactions    | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging,weisci_jaws_archive                                     |
| captcha_complex          | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| change_log               | test,junk                                                                                                         |
| chat_staff               | test,junk                                                                                                         |
| comments                 | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging,weisci_jaws_archive                                     |
| donations_charities      | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| donations_charities_meta | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| donations_donations      | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| foo_reference1           | timpost1,timpost2                                                                                                 |
| foo_reference2           | timpost1,timpost2                                                                                                 |
| foo_reference3           | timpost1,timpost2                                                                                                 |
| groups                   | weisci_jaws_staging,weisci_jaws_archive,weisci_jaws_live,weisci_jaws_staging2                                     |
| ipvisitor                | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| job_post                 | giannosfor,test                                                                                                   |
| layout                   | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| listeners                | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| mediamanager_files       | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_staging                                                         |
| mediamanager_group       | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| mediamanager_photos      | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| mediamanager_video       | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| menus                    | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| menus_groups             | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| mytable                  | ryanzec,user1267617,cabita,dotancohen,johnlocke,neeraj,test,user391986,cool_cs,javier,mathieu                     |
| mytext                   | jakobud,newstuff                                                                                                  |
| occupation_field         | giannosfor,test                                                                                                   |
| policy_agentblock        | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| policy_ipblock           | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| prova                    | veto,vito                                                                                                         |
| registry                 | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| registry_bk              | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                                         |
| session                  | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| static_pages             | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| static_pages_translation | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| t                        | preeti,rollup_test                                                                                                |
| t1                       | abidibo,test                                                                                                      |
| t2                       | test,abidibo                                                                                                      |
| t3                       | test,abidibo                                                                                                      |
| table1                   | table_test,supercoolville,test                                                                                    |
| table2                   | supercoolville,test,table_test                                                                                    |
| tags                     | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tags_content             | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tbl_banner_position      | weisci_jaws_staging2,weisci_jaws_live                                                                             |
| tbl_banner_upload        | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                                         |
| tbl_global_banner        | weisci_jaws_staging2,weisci_jaws_live                                                                             |
| tms_authors              | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tms_repositories         | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| tms_themes               | weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging                                     |
| updates                  | test,junk                                                                                                         |
| url_aliases              | weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2,weisci_jaws_archive                                     |
| url_maps                 | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| users                    | weisci_jaws_archive,weisci_jaws_staging,giannosfor,veto,weisci_jaws_live,weisci_jaws_staging2,friends,sample,vito |
| users_groups             | weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live,weisci_jaws_staging2                                     |
| users_meta               | weisci_jaws_staging2,weisci_jaws_archive,weisci_jaws_staging,weisci_jaws_live                                     |
+--------------------------+-------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.91 sec)

mysql>

Give it a Try !!!

Best Answer

Related Solutions

Mysql – remove duplicate rows in thesql table that does not contain primary key

Mysql – Should I use left join to do the job in this scenario

SUGGESTION

Related Question