Thesql, performance issue with FOR UPDATE query

MySQL

I have a query like this:

SELECT id,domain_name FROM domains WHERE test_stage=1 order by id LIMIT 200 FOR UPDATE

This is happening up to 500 times multithreaded within a second, so quite fast and it's overloading my database server sometimes.
I think it's got something to do with an internal state of mysql, usually it's handled quickly but sometimes it just builds up and up.

Server is mysql 5.5. (I wish I could use a newer one but it's not possible to upgrade)
I have an index on test_stage,id
test_stage is an enum with just a few entries, id is autoincrement integer with a few million entries.

Anyone has an idea how to do that sort of query faster?
I know later mysql servers allow to "skip" locked rows, sadly I am with 5.5.

The purpose is selecting a number of "jobs" for a multithreaded worker.

P.S.
The only reason why my server is not crashing is because I have a CPU check added. As soon as 10 cores are fully loaded it stops adding more threads.

Update:
The server is quite strong, 120GB memory (mostly for mysql) and it's far from being maxed out.

Item(s) coming in up to 10,000 times a second.
1-20,000 arrive in each batch.
The client threads are limited to 600 threads by php and additional logic (runs on apache).
The threading uses  PHP 7.2 on serverside .
max_connections = 1500
Processing one item takes, on average 1-2 ms when everything works fine.
The table domains is current 20MM rows; it is purged never 

CREATE TABLE `domains` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `domain_name` varchar(64) NOT NULL,
  `test_time` datetime DEFAULT NULL,
  `website_seo` varchar(512) DEFAULT NULL,
  `context` varchar(15) DEFAULT NULL,
  `test_stage` enum('a','b','c','d','e','f') DEFAULT 'a',
  `type` enum('a','b','c','d','e','f') DEFAULT NULL,
  `counter_z` smallint(6) DEFAULT NULL,
  `counter_a` smallint(6) DEFAULT '0' ,
  `counter_b` smallint(6) DEFAULT '0' ,
  `counter_c` smallint(6) DEFAULT '0' ,
  `info_a` varchar(256) DEFAULT NULL,
  `info_b` varchar(256) DEFAULT NULL,
  `extrapolate_a` varchar(16) DEFAULT NULL ,
  `extrapolate_c` varchar(16) DEFAULT NULL,
  `extrapolate_int` smallint(6) DEFAULT '0' ,
  `max_successful` smallint(6) DEFAULT NULL,
  `counter_successful` smallint(6) DEFAULT NULL,
  `timeout_counter_successful` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `domain_name` (`domain_name`),
  KEY `domain_name_test_time` (`domain_name`,`test_time`),
  KEY `test_time` (`test_time`),
  KEY `test_stage` (`test_stage`),
  KEY `counter_c_t` (`counter_c`,`test_time`),
  KEY `test_stage_test_time_counter_c` (`test_stage`,`test_time`,`counter_c`),
  KEY `test_stage_id` (`test_stage`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=19437107 DEFAULT CHARSET=latin1

I'd like to add: The issue is not apache or the number of threads, it's mysql not delivering the data.
My problem is that sometimes the "FOR UPDATE" select which selects a bunch of rows, and then updates them to be used is delaying.
It can happen with 1000 rows being unprocessed, it can happen with 100,000 rows unprocessed.
After a while it suddenly works again, usually mysql can catch up and then you don't see anything in "processlist" anymore.
When it happens you see around 100-200 of them waiting and CPU spikes up to 10 cores (then the queue is locked from further additions)

Update:
I've had issues with similar problems before, the only solution then was to actually move all "open" requests into a temporary table and make the queue selections from there, synchronizing it back into the full table.
It's alot of additional work but then mysql is fast, I hate to do it again 🙁

Best Answer

My Mantra: "Don't queue it, just do it." Although I don't fully understand your design, I suspect that the problems would decrease if you simply processed the data as it comes in.

A Rule of Thumb: If an item can be processed in less time than it takes to enqueue and dequeue it, then the Queue is wasting time.

To better discuss this, I need to understand all the dataflow. For starters, fill in the blanks:

  • Item(s) coming in ___ times a second.
  • __ arrive in each batch.
  • The client threads are limited to ____ threads by ______ (Apache/Tomcat/...).
  • The threading uses _____ (PHP/Java/C#/...).
  • max_connections = ____
  • Processing one item takes, on average ___ ms.
  • The table domains is current ____ rows; it is purged ______ (never/daily/immediately/...) And provide SHOW CREATE TABLE domains.

More

Thanks for the update!

I gather that something else is changing test_stage, thereby effectively re-queueing rows?

Get rid of KEY test_stage (test_stage), since it is redundant with two other indexes.

Change KEY test_stage_id (test_stage, id) to KEY tid (test_stage, id, domain_name) so that it is "covering".

One thought is to get rid of id and promote domain_name to be the PRIMARY KEY. This avoids the need to deal with two unique keys. Note that that would change the query and optimal index to

SELECT domain_name  FROM domains  WHERE test_stage=1 
    ORDER BY domain_name  LIMIT 200  FOR UPDATE
INDEX(test_stage, domain_name)

Another thought is to assign the "worker" threads to contiguous chunks of the table, based on the PRIMARY KEY. That way, they should never (or at least rarely) collide over which rows to grab. (I suspect this is the core of the issue.)

I assume your buffer_pool is big enough. But if you would like a critique of the VARIABLES, please provide SHOW GLOBAL STATUS and SHOW GLOBAL VARIABLES.