Mysql – How to find duplicate values in multiple columns over different rows

duplicationMySQL

I have a database with a table of games (some columns omitted):

gameid
hometeam
awayteam
`date`

Sometimes games are added with the same home and away teams, but on a different date that is just 1 day difference (game date moved), and the old game isn't deleted. I'd like to be able to search over a variable number of days as well. Thus, for football, having the same opponent in a time span of +/- 5 days would be wrong, but possible in basketball or baseball or many other sports.

It's easy to search for multiple values of the same home/away/date, but the date difference makes this harder.

Also, the home and away team may be swapped, with the date also being the same or slightly different.

Example data:

gameid hometeam awayteam date
5      777      999      2014-10-23
6      999      777      2014-10-23
7      777      999      2014-10-24
8      777      999      2014-10-25

All of these are duplicates. Determining which doesn't matter, just that it should let me know that there are 4 games scheduled for this which it should (probably) be 1.

This is what I use to find duplicated games for the same home/away/date:

SELECT COUNT(*) as num,hometeam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,hometeam HAVING `num`>1
            UNION
SELECT COUNT(*) as num,awayteam as teamid,`date` FROM `game` WHERE sportid=1 AND 'deleted_at' IS NULL  AND `date` BETWEEN '2014-07-01' AND '2015-06-30' GROUP BY `date`,awayteam HAVING `num`>1 ORDER BY `num` DESC;

Best Answer

Not sure why the question popped up now, but if you are still interested in an answer something like:

select g1.* 
from games g1 
where exists ( 
    select 1 
    from games g2 
    where g1.gameid <> g2.gameid 
      and least(g1.hometeam,g1.awayteam) 
        = least(g2.hometeam,g2.awayteam) 
      and greatest(g1.hometeam,g1.awayteam) 
        = greatest(g2.hometeam,g2.awayteam) 
      and abs(datediff(g1.d, g2.d)) < 2
);

should give you what you need

Related Solutions

Mysql – Need to find duplicate entries

Suppose your table is called ingredients. Try the following:

Step 01) Create an empty delete keys table called ingredients_delete_keys

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;

Step 02) Create PRIMARY KEY on ingredients_delete_keys

ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);

Step 03) Index the ingredients table with fk,recipe,pkey

ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);

Step 04) Populate the ingredients_delete_keys table

INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;

Step 05) Perform a DELETE JOIN on ingredients table using keys that don't match

DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;

Step 06) Drop the delete keys

DROP TABLE ingredients_delete_keys;

Step 07) Get rid of the fk_recipe_pkey_ndx index

ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

OK Here are all the lines in one block...

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
DROP TABLE ingredients_delete_keys;
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

Give it a Try !!!

CAVEAT

Notice that using MIN function helps keep the first pkey entered for fk. If you switch it to MAX function instead, the last pkey entered for fk is kept.

Sql-server – Select from multiple rows without duplicate values, with all random data

I broke this down using pivot and not exists. I really would handle this in the presentation layer though.

--load test data
declare @table table (c1 int, c2 int, c3 int)
insert into @table
values
(1,1,1)
,(1,1,1)
,(2,3,2)
,(4,2,4)
,(5,4,6)
,(7,5,8)
,(9,7,11)
,(11,9,13)
,(14,16,15)

--get our unique values in a cte to pivot later
;with cte as(
select 
    --here we add a RN so that we can use pivot without losing values
    r = row_number() over (partition by Col order by (select 1))
    ,i.*
from
    (
    --for each column, we get the unique values where they don't exist in the other two columns
    --we union them together, but give them 1 /2 / 3 column identifier
    select
        1 as Col, c1.c1 as val
    from
        (select distinct t1.c1 from @table t1
         where  not exists (select 1 from @table t2 where t2.c2 = t1.c1)
            and not exists (select 1 from @table t3 where t3.c3 = t1.c1)) c1
    union
    select 
        2 as col, c2.c2
    from
        (select distinct t1.c2 from @table t1
         where  not exists (select 1 from @table t2 where t2.c1 = t1.c2)
            and not exists (select 1 from @table t3 where t3.c3 = t1.c2)) c2 
    union
    select
        3 as col, c3.c3
    from
        (select distinct t1.c3 from @table t1
         where  not exists (select 1 from @table t2 where t2.c1 = t1.c3)
            and not exists (select 1 from @table t3 where t3.c2 = t1.c3)) c3
    ) i
)


--simple pivot
select
    [1], [2], [3]
from cte 
pivot
(max(Val) for Col in ([1],[2],[3]))
p

RETURNS

+------+------+----+
|  1   |  2   | 3  |
+------+------+----+
| 14   | 3    |  6 |
| NULL | 16   |  8 |
| NULL | NULL | 13 |
| NULL | NULL | 15 |
+------+------+----+

Best Answer

Related Solutions

Mysql – Need to find duplicate entries

Sql-server – Select from multiple rows without duplicate values, with all random data

Related Question