Suppose your table is called ingredients
. Try the following:
Step 01) Create an empty delete keys table called ingredients_delete_keys
CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
Step 02) Create PRIMARY KEY on ingredients_delete_keys
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
Step 03) Index the ingredients
table with fk,recipe,pkey
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
Step 04) Populate the ingredients_delete_keys
table
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
Step 05) Perform a DELETE JOIN on ingredients table using keys that don't match
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
Step 06) Drop the delete keys
DROP TABLE ingredients_delete_keys;
Step 07) Get rid of the fk_recipe_pkey_ndx
index
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;
OK Here are all the lines in one block...
CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
DROP TABLE ingredients_delete_keys;
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;
Give it a Try !!!
CAVEAT
Notice that using MIN function helps keep the first pkey entered for fk. If you switch it to MAX function instead, the last pkey entered for fk is kept.
In the spirit of @yercube's answer, I have an answer that has an added twist.
CREATE TABLE stage
(
id int not null auto_increment,
name varchar(20),
primary key (id)
);
CREATE TABLE stage2 LIKE stage;
INSERT INTO stage (name) SELECT name FROM item;
INSERT INTO stage2 (id) SELECT min_id FROM
(SELECT MIN(id) min_id,name FROM stage GROUP BY name) A;
UPDATE stage2 A INNER JOIN stage B USING (id) SET A.name=B.name;
TRUNCATE TABLE item;
INSERT INTO item (name) SELECT name FROM stage2;
DROP TABLE stage;
DROP TABLE stage2;
This will load stage2 with the first occurrence of each name from item, zap the item table, and load the unique occurrences back.
If you look back in @yercube's answer and compare it to my answer, his is much more simplistic because
- @yercube uses one temp table, while I use two
- I had to create a column for iteration control, @yercube did not need to
- @yercube has fewer steps
- both answers achieve the same thing
I do not expect my answer to be accepted. The sole purpose of my answer was demonstrate that other answers lose the concise clarity needed to solve your problem. Again, hats off to @yercube.
Best Answer
There are a few way of doing this and depends on the size of your initial table.
Option 1:
Option 2 (Horribly slow -- never do)
Option 3 -- Probably what you're looking for
So I gave a few differences but there are a few things I should mention. You are using GROUP BY without an aggregate function which in general means you can have non-deterministic results. Also it's just not how you use group by. What I tell my developers is when using group by it's as if you're building a new table with X fields as the PRIMARY KEY of that table.
Do not use #2 as it has to basically do 1 full scan for every userid, I just wanted to give it to you so you can see that you can double link a table (as long as it's not a temporary table).
Option 1 Is similar to #3 but uses temp tables which are extremely fast and I figured you can then use the temp tables to do your linking on another table. Just remember, temporary tables are gone once you CLOSE the session.
Let me know if this is not what you were looking for