MySQL table not working after a few days of use

blockingdeadlocklockingMySQLorm

I have the following tables on MySQL server, 5.5.46:


CREATE TABLE `locking` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `filename` varchar(255) NOT NULL,
  `timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `origin` varchar(255) NOT NULL,
  `system` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `filename` (`filename`,`origin`)
) ENGINE=InnoDB AUTO_INCREMENT=74882007 DEFAULT CHARSET=utf8;

CREATE TABLE `locking_rules` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `timeout` int(11) NOT NULL,
  `origin` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8;

And I have the following SQL which is executed periodically from different machines (simultaneously or not):

DELETE l FROM locking l,locking_rules r WHERE l.origin=r.origin AND r.name!='*' AND ABS(TIME_TO_SEC(TIMEDIFF(l.timestamp,NOW())))>r.timeout

In this table there are permanently inserted new rows. The table is in general at about 20000 rows size. I insert on DB using OpenJPA ORM, and remove using the provided "native query".

The problem:

Sometimes, once at few days, delete queries on locking table are just waiting and nothing happens anymore to this table. Can anybody give me some hints on which part should I look in order to see where the problem sits?

If I run show engine INNODB status, output doesn't have anything related to locking table locks or deadlocks.

Best Answer

Part of the problem is that you are scanning most of 20K rows and locking the rows to be deleted as you go. The scan, itself, is time consuming.

If you are using MyISAM, that is a big issue -- switch to InnoDB.

Don't use TIMEDIFF, it will get you in trouble for large differences. I will assume that l.timestamp was some time in the past and that r.timeout is how many seconds before it "times out".

DELETE  l
    FROM  locking l,
          locking_rules r
    WHERE  l.origin = r.origin
      AND  r.name!='*'
      AND  l.timestamp > NOW() - INTERVAL r.timeout SECOND
    LIMIT 100

And have

l: INDEX(origin, timestamp)

If the DELETE is running too slowly, either increase the frequency of running it, or increase the LIMIT. But I would not use more than 1000.

OK, the LIMIT will probably spit at you. So, I'll provide another possibility:

DELETE  l
    FROM  locking l,
    JOIN  ( SELECT l2.id
              FROM locking l2
              JOIN locking_rules r  ON r.origin = l.origin
                                   AND l.timestamp > NOW() - INTERVAL r.timeout SECOND
              WHERE r.name!='*'
              LIMIT 100
           ) s  ON s.id = l.id;

But this will also suffer from scanning too much of locking to find the few rows that need deleting.

So, let's start over and break the task into two steps:

  1. Find 100 ids to delete. This is just a SELECT, so it has much less impact.
  2. Delete those 100 (or fewer) via the PRIMARY KEY -- This is much less invasive.

thus:

CREATE TEMPORARY TABLE t
    SELECT l2.id
      FROM locking l2
      JOIN locking_rules r  ON r.origin = l.origin
                           AND l.timestamp > NOW() - INTERVAL r.timeout SECOND
      WHERE r.name!='*'
      LIMIT 100;
DELETE  l, t
    WHERE l.id = t.id;

Let's prevent the various connections from doing the "same" code at the "same" time. (In pseudo client code:)

if (GET_LOCK('messy delete) == 1)
{
    perform the above code
    RELEASE_LOCK('messy delete');
}

It may be that the sporadic hangup is due to two DELETEs stumbling over each other, perhaps even having a 'domino effect' cascading into other connections.