Suppose your table is called ingredients
. Try the following:
Step 01) Create an empty delete keys table called ingredients_delete_keys
CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
Step 02) Create PRIMARY KEY on ingredients_delete_keys
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
Step 03) Index the ingredients
table with fk,recipe,pkey
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
Step 04) Populate the ingredients_delete_keys
table
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
Step 05) Perform a DELETE JOIN on ingredients table using keys that don't match
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
Step 06) Drop the delete keys
DROP TABLE ingredients_delete_keys;
Step 07) Get rid of the fk_recipe_pkey_ndx
index
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;
OK Here are all the lines in one block...
CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
DROP TABLE ingredients_delete_keys;
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;
Give it a Try !!!
CAVEAT
Notice that using MIN function helps keep the first pkey entered for fk. If you switch it to MAX function instead, the last pkey entered for fk is kept.
In the spirit of @yercube's answer, I have an answer that has an added twist.
CREATE TABLE stage
(
id int not null auto_increment,
name varchar(20),
primary key (id)
);
CREATE TABLE stage2 LIKE stage;
INSERT INTO stage (name) SELECT name FROM item;
INSERT INTO stage2 (id) SELECT min_id FROM
(SELECT MIN(id) min_id,name FROM stage GROUP BY name) A;
UPDATE stage2 A INNER JOIN stage B USING (id) SET A.name=B.name;
TRUNCATE TABLE item;
INSERT INTO item (name) SELECT name FROM stage2;
DROP TABLE stage;
DROP TABLE stage2;
This will load stage2 with the first occurrence of each name from item, zap the item table, and load the unique occurrences back.
If you look back in @yercube's answer and compare it to my answer, his is much more simplistic because
- @yercube uses one temp table, while I use two
- I had to create a column for iteration control, @yercube did not need to
- @yercube has fewer steps
- both answers achieve the same thing
I do not expect my answer to be accepted. The sole purpose of my answer was demonstrate that other answers lose the concise clarity needed to solve your problem. Again, hats off to @yercube.
Best Answer
A lesser-known feature of Oracle databases- every row has a [hidden] column, called ROWID.
These meaningless,, character values can be used to isolate duplicates like this and get rid of them.
This query should get you candidates rows to be deleted:
You can then delete rows using the ROWID directly.
All that said, do not be tempted to use them for anything else!!
ROWID's are potentially volatile and so can change on you over time.