This would depend on which one of the repeat you want to keep. For this example, I will dedup the table based on url and title and keeping the first occurrence.
First thing to do: run these queries
SELECT COUNT(1) FROM RSS_items;
SELECT COUNT(1) dupcount_two,url,title FROM RSS_items
GROUP BY url,title HAVING COUNT(1) = 2 WITH ROLLUP;
SELECT COUNT(1) dupcount_morethantwo,url,title FROM RSS_items
GROUP BY url,title HAVING COUNT(1) > 2 WITH ROLLUP;
This will show total table count and how all (url,title) combinations that repeat twice and more than twice with the total for each at the bottom. If the sum of the repeats is more than 5% of the total:
CREATE TABLE RSS_items_URLTitle LIKE RSS_items;
INSERT INTO RSS_items_URLTitle (id,url,title)
SELECT MIN(id),url,title
FROM RSS_items GROUP BY url,title
HAVING
;
DELETE B.*
FROM RSS_items_URLTitle A
INNER JOIN RSS_items B
USING (url,title) WHERE A.id <> B.id
;
DROP TABLE RSS_items_URLTitle;
This method made be very slow if you do not have an index on (url,title)
Otherwise, run this
CREATE TABLE RSS_items_New LIKE RSS_items;
CREATE TABLE RSS_items_URLTitle LIKE RSS_items;
INSERT INTO RSS_items_URLTitle (id,url,title)
SELECT MIN(id),url,title
FROM RSS_items GROUP BY url,title
;
INSERT INTO RSS_items_New
SELECT B.*
FROM RSS_items_URLTitle A
INNER JOIN RSS_items USING (id)
;
ALTER TABLE RSS_items RENAME RSS_items_Zap;
ALTER TABLE RSS_items_New RENAME RSS_items;
DROP TABLE RSS_items_URLTitle;
DROP TABLE RSS_items_Zap;
This method may be better since it only operates on the table id.
Please look over both methods. Try testing it on test databases with copies of the data before running anything.
If you want to keep the last occurrence of duplicates, replace MIN(id)
with MAX(id)
.
Give it a Try !!!
Your statement seems to be correct. It could also be written like this:
DELETE ii
FROM inventoryitems AS ii
WHERE ii.itemid = 2340000
AND EXISTS
( SELECT *
FROM characters AS c
WHERE c.characterid = ii.characterid
AND EXISTS
( SELECT *
FROM accounts AS a
WHERE a.id = c.accountid
AND a.banned = 1
)
) ;
One thing that may be causing this is if you have a character related to many accounts and one of them has banned = 1
while the other have banned = 0
. I assume you want the deletion to happen (not with just one but) only if all the related accounts have banned = 1
. We can modify the above code to:
DELETE ii
FROM inventoryitems AS ii
WHERE ii.itemid = 2340000
AND EXISTS
( SELECT *
FROM characters AS c
WHERE c.characterid = ii.characterid
AND EXISTS
( SELECT *
FROM accounts AS a
WHERE a.id = c.accountid
AND a.banned = 1
)
AND NOT EXISTS
( SELECT *
FROM accounts AS a
WHERE a.id = c.accountid
AND a.banned = 0
)
) ;
or simpler to:
DELETE ii
FROM inventoryitems AS ii
WHERE ii.itemid = 2340000
AND EXISTS
( SELECT *
FROM characters AS c
JOIN accounts AS a
ON a.id = c.accountid
WHERE c.characterid = ii.characterid
HAVING MIN(a.banned) = 1
) ;
After the clarifications, all the above are void. The problem was that characters
table does not have characterid
column but only id
. So the statement used by the OP was translated/parsed as:
DELETE FROM inventoryitems
WHERE characterid IN
(SELECT inventoryitems.characterid -- notice this
from characters
WHERE accountid IN
(SELECT id
from accounts
WHERE banned = '1'
)
)
AND itemid = '2340000';
which makes the subquery uncorrelated and means "delete all rows with itemid = 2340000
" if there exists at least one row (any row, not necessarily related) in the accounts with banned=1
"
That's one reason why it's good to always (*) write columns with their full name as tablename.columnname
or tablealias.columnname
(an error would have been thrown if you had done this and problem would have been solved faster.)
(*) Unless one wants this behaviour to occur, which is a rather extreme case.
Best Answer
Is there a reason this simple correlated delete statement won't do the trick?