Mysql – Most efficient way to cascade delete in MySQL

MySQL

I have several 1-to-many FK relationships in my MySQL schema, with tables as follows:

AGENCYS  1-*>  ROUTES  1-*>  TRIPS  1-*>  STOPTIMES

At present, there are roughly 1300 agencys, 16000 routes, 1734765 trips and 69,000,000 stoptimes – and we'd like to scale much larger than this. Each FK relationship has a CASCADE deletion rule set.

Each week, I need to delete all the agencys that have a particular agency_import_source, and delete all their descendants too (i.e. routes, trips and stoptimes).

The obvious query, to a non-expert such as myself, is this:

DELETE FROM agencies
WHERE agencys.agency_import_source = "UK"

However, this is timing out after ten minutes, I imagine because of the huge number of foreign key indexes. Each table also has multiple additional indexes that perhaps need to be re-built each time too — I'm not an expert in MySQL unfortunately.

What is a sensible way of tackling this problem? Ideally, this task needs to complete quickly with many, many more records than this.

FOOTNOTES:
1. The context of this question: we store transit data from around the world (schedules, etc.) The data is updated from various sources at different times, so I need to remove the old records related to a certain source and replace them with the newer data, while preserving records from other sources.

  1. We're using MySQL 5.7 to get InnoDB spatial index support, although I hope this is not relevant – the problem persists in MySQL 5.6 too.

Best Answer

While you are using foreign keys in the way that they were designed for, in reality disks can put a limit to the speed of certain operations.

There are certain configuration options that I would advise to check for performance first (innodb_flush_log_at_trx_commit, innodb_log_file_size, etc.) but I am going to assume that there is no performance problems because misconfiguration, and that proper indexes are in place. Profiling your query can help make sure that that is correct.

I would advise you against large transactions modifying lots of rows. Your server, specially under high load can end up impacted quite a lot by such operations. I would recommend using the JOIN syntax to extract the id's of all rows to be deleted (could be millions?) and delete them in batches. A tool like pt-archiver can help you automatize that, making sure that you are doing things in small transactions, minimizing impact on the server by large write operations.

Another option that may help you with maintenance is mysql PARTITIONING. When a table is paritioned, instead of deleting every row one by one, you can delete whole partitions, which are on different physical locations, resulting in faster delete operations.

I know these suggestions are a bit general, but that is what I can do for you with the information provided.