Mysql – How to find duplicate values in multiple columns over different rows

duplicationMySQL

I have a database with a table of games (some columns omitted):

gameid
hometeam
awayteam
`date`

Sometimes games are added with the same home and away teams, but on a different date that is just 1 day difference (game date moved), and the old game isn't deleted. I'd like to be able to search over a variable number of days as well. Thus, for football, having the same opponent in a time span of +/- 5 days would be wrong, but possible in basketball or baseball or many other sports.

It's easy to search for multiple values of the same home/away/date, but the date difference makes this harder.

Also, the home and away team may be swapped, with the date also being the same or slightly different.

Example data:

gameid hometeam awayteam date
5      777      999      2014-10-23
6      999      777      2014-10-23
7      777      999      2014-10-24
8      777      999      2014-10-25

All of these are duplicates. Determining which doesn't matter, just that it should let me know that there are 4 games scheduled for this which it should (probably) be 1.

This is what I use to find duplicated games for the same home/away/date:

SELECT COUNT(*) as num,hometeam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,hometeam HAVING `num`>1
            UNION
SELECT COUNT(*) as num,awayteam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL  AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,awayteam HAVING `num`>1 ORDER BY `num` DESC;

Best Answer

Not sure why the question popped up now, but if you are still interested in an answer something like:

select g1.* 
from games g1 
where exists ( 
    select 1 
    from games g2 
    where g1.gameid <> g2.gameid 
      and least(g1.hometeam,g1.awayteam) 
        = least(g2.hometeam,g2.awayteam) 
      and greatest(g1.hometeam,g1.awayteam) 
        = greatest(g2.hometeam,g2.awayteam) 
      and abs(datediff(g1.d, g2.d)) < 2
);

should give you what you need

Related Question