Mysql – How to coordinate two sessions writing the same row to a MySQL table

MySQL

I am using MySQL 5.6 and have an append-only "lookup" table that's used to store reference data:

CREATE TABLE my_table (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  my_column VARCHAR(32) NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY my_column_uk (my_column)
) ENGINE=INNODB DEFAULT CHARSET=utf8;

Rows are inserted, but never updated or deleted. The table is small (thousands of rows at most), so query performance is not a concern.

I have 2 different processes (running identical code) which insert rows into this table. Each process has an in-memory write-through cache of the my_column -> id value. Whenever a new entry is put into the cache that doesn't already exist, it's synchronously inserted into the database table, then added to the cache.

How do I handle the scenario where 2 (or more) processes try to insert the same my_column value at the same time?

Since the table has a unique key constraint on my_column, my thought is to have both processes try to insert the new row and get the new PK id value. One of them will succeed, and can add the new entry to its cache. The other one will fail, but then it can fall back to querying for the id of the row that the other process inserted. I call this the "write-read" approach.

Is there a better way of handling this contention between the two processes? Would a "read-write" approach be better, potentially with a different transaction isolation level ("read uncommitted" e.g.)?

The one potential downside I see with the "write-read" approach is that it involves catching a duplicate key exception, and using that as an indicator to query for the existing row. This is perhaps a misuse of an exception, because this scenario is expected to occasionally happen, and is not really exceptional.

The problem with the "read-write" approach is that I'm not sure there's a good way to have process 2 "see" the row that process 1 added, until it finishes its transaction. So this might need to be a "read-write-read", in which case I'm better off just starting with the INSERT like in the "write-read" approach.

Some more constraints:

I need to get the resulting PK ID, regardless of whether the row was inserted or already exists. However I don't want to burn through auto-increment values, so I'm not going to use the UPDATE id = LAST_INSERT_ID(id) kludge. I'd rather just execute a second SQL statement to SELECT the id value, because I'm only inserting a row every ~5 minutes into a table with ~100k rows, so query performance is not a major concern.
The reason why I'm not using the VARCHAR my_column column as the PRIMARY KEY column is that this lookup table has a FK reference from another much bigger (500M rows, 20 cols) table. I'm using this lookup table pattern to store reference data – e.g. failure codes ("eligibility.failure.not_enough_foobar_data") or time zones ("America/Los_Angeles") – so that the much larger main table can use a short FK id to reference that value (e.g. failure_code_id or time_zone_id), instead of repeating the long string.

Best Answer

Here's the best solution I've been able to come up with, using a write-read approach:

INSERT using ON DUPLICATE KEY UPDATE; (without first checking to see if the row exists)
if the value returned != 0, then that's the PK of the new row;
else a row must already exist with that value, so SELECT it;
if the SELECT can't find the row, then throw an exception because this is unrecoverable.

This has the following advantages:

it doesn't require any transactionality or a different isolation level, because the insert-if-not-exists statement is already an atomic operation;
it executes at most 2 SQL statements (insert-select) instead of 3 (select-insert-select), making the calling code simpler;
since we're not using INSERT IGNORE, if the INSERT statement throws an exception it will be for a legit reason - e.g. auto_increment has reached the max value of id column - and thus we don't have to worry about needlessly logging duplicate key exceptions or ignoring actual database problems.

Related Solutions

Mysql – find and insert row to another table using thesql trigger

This should do the trick for you:

DELIMITER $$
DROP TRIGGER IF EXISTS `employee_INSERT` $$
CREATE TRIGGER `employee_INSERT` 
AFTER INSERT ON `employee`
FOR EACH ROW
BEGIN
    INSERT INTO employee_tools (Id, Tool)
    SELECT new.Id, tools.Tool_Name 
            FROM tools
            WHERE tools.Division = new.Division;
END $$
DELIMITER ;

MySQL – How to Select Every N-th Row Without Pulling Entire Table

For test data in events

id  txtcol
--  ------
 1  event0
 2  event1
 4  event2
 5  event3
 6  event4
 8  event5
 9  event6

Retrieve the primary key values in ascending order

SELECT id FROM events ORDER BY id

Wrap that in a query to assign a zero-based rank

set @row:=-1;
SELECT @row:=@row+1 AS rownum, id 
FROM
    (
        SELECT id FROM events ORDER BY id
    ) AS sorted

Wrap that in a query to select the first row and every third row thereafter

set @row:=-1;
SELECT id
FROM
    (
        SELECT @row:=@row+1 AS rownum, id 
        FROM
            (
                SELECT id FROM events ORDER BY id
            ) AS sorted
    ) as ranked
WHERE rownum % 3 = 0

Finally, wrap that in a query to retrieve the other columns

set @row:=-1;
SELECT events.*
FROM
    events
    INNER JOIN
    (
        SELECT id
        FROM
            (
                SELECT @row:=@row+1 AS rownum, id 
                FROM
                    (
                        SELECT id FROM events ORDER BY id
                    ) AS sorted
            ) as ranked
        WHERE rownum % 3 = 0
    ) AS subset
        ON subset.id = events.id

returning

id  txtcol
--  ------
 1  event0
 5  event3
 9  event6

Best Answer

Related Solutions

Mysql – find and insert row to another table using thesql trigger

MySQL – How to Select Every N-th Row Without Pulling Entire Table

Related Question