MySQL – Best Approach to Delete and Update Large Number of Rows

deleteMySQLoptimizationperformanceupdate

I'm currently developing a REST API that connects to a MySQL database and does a bunch of queries.

In one of those queries I happen to have to delete a bunch of rows from a table and each of those rows have a unique identifier.

And basically I have two approaches to do so: (that I know).

Build a bunch of DELETE FROM table WHERE id=value; statements and execute them sequentially.
Build only a DELETE FROMstatement but with a massive OR chain.

I don't have a ton of experience handling databases and any suggestion or advice would be greatly appreciated.

EDIT:

I just noticed that I also have something similar happening with a UPDATE FROM statement where I set a lot of records to the same given value, so any advice on that too would be really nice.

Best Answer

Simple and easy:

DELETE FROM tbl
    WHERE id IN ( 11,22,... 999 );

But... If you have thousands of ids to delete, let's discuss where they the ids came from. If they are sitting in another table, or can be found via some form of SELECT, then a "Multi-table DELETE" is probably a better way to do it.

Or if you know that the rows are consecutive in some way, then there may be other options. For example, to delete "old" rows:

DELETE FROM tbl
    WHERE datetime < NOW() - INTERVAL 7 DAY;

will delete rows older than a week.

Related Solutions

Mysql – How to delete repeated rows

This would depend on which one of the repeat you want to keep. For this example, I will dedup the table based on url and title and keeping the first occurrence.

First thing to do: run these queries

SELECT COUNT(1) FROM RSS_items;

SELECT COUNT(1) dupcount_two,url,title FROM RSS_items
GROUP BY url,title HAVING COUNT(1) = 2 WITH ROLLUP;

SELECT COUNT(1) dupcount_morethantwo,url,title FROM RSS_items
GROUP BY url,title HAVING COUNT(1) > 2 WITH ROLLUP;

This will show total table count and how all (url,title) combinations that repeat twice and more than twice with the total for each at the bottom. If the sum of the repeats is more than 5% of the total:

CREATE TABLE RSS_items_URLTitle LIKE RSS_items;
INSERT INTO RSS_items_URLTitle (id,url,title)
    SELECT MIN(id),url,title
    FROM RSS_items GROUP BY url,title
    HAVING 
;
DELETE B.*
    FROM RSS_items_URLTitle A
    INNER JOIN RSS_items B
    USING (url,title) WHERE A.id <> B.id
;
DROP TABLE RSS_items_URLTitle;

This method made be very slow if you do not have an index on (url,title)

Otherwise, run this

CREATE TABLE RSS_items_New LIKE RSS_items;
CREATE TABLE RSS_items_URLTitle LIKE RSS_items;
INSERT INTO RSS_items_URLTitle (id,url,title)
    SELECT MIN(id),url,title
    FROM RSS_items GROUP BY url,title
;
INSERT INTO RSS_items_New
    SELECT B.*
    FROM RSS_items_URLTitle A
    INNER JOIN RSS_items USING (id)
;
ALTER TABLE RSS_items RENAME RSS_items_Zap;
ALTER TABLE RSS_items_New RENAME RSS_items;
DROP TABLE RSS_items_URLTitle;
DROP TABLE RSS_items_Zap;

This method may be better since it only operates on the table id.

Please look over both methods. Try testing it on test databases with copies of the data before running anything.

If you want to keep the last occurrence of duplicates, replace MIN(id) with MAX(id).

Give it a Try !!!

Delete all rows within transaction: what would happen to other transactions

The comment by a_horse_with_no_name is actually the answer, but since it was not posted as an answer, I'm not able to mark it so. I've completely forgotten about transaction isolation levels, I've asked the question in VoltDB forums and it appears that as long as make sure that stored procedures are local to nodes (sorry, this is a bit of a VoltDB terminology), that is, they don't process data across partitioned tables, this should work.

Best Answer

Related Solutions

Mysql – How to delete repeated rows

Delete all rows within transaction: what would happen to other transactions

Related Question