I'm getting around to cleaning a database but I need some help with deleting near duplicates in MySQL database but is only off by one character in a column. I've searched and found ways to remove true duplicates but each off the solutions I've seen for near duplicates (timestamp, deleting without worry about which record stays) don't seem to apply.
Certain data must remain based on one column.
id | plyrID | year | pos | teamID
1 | 111548 | 2005 | OF | 367
2 | 111548 | 2005 | RF | 367
3 | 111548 | 2006 | RF | 332
4 | 592743 | 2012 | SS | 99
5 | 592710 | 2012 | SS | 111
If there is a near duplicate based on plyrID, year, and teamID
and if pos = "OF"
, I want to keep the record where the pos is either "RF", "LF", or "CF"
. The result should look like:
id | plyrID | year | pos | teamID
2 | 111548 | 2005 | RF | 367
3 | 111548 | 2006 | RF | 332
4 | 592743 | 2012 | SS | 99
5 | 592710 | 2012 | SS | 111
where the record with pos = "OF"
is deleted and the other remains.
Any help would be much appreciated.
Best Answer
So, rephrasing your requirement a little, you want to delete every row whose
pos
has the value ofOF
and for which another row in the same table exists with the sameplyrID
,year
andteamID
, and withpos
ofRF
,LF
orCF
.That sounded almost SQL, you would only need to correct some "grammar":
The above is standard SQL but it wouldn't work in MySQL. Needs rewriting with
JOIN
: