Mysql – Would using SELECT FOR UPDATE and a subsequent INSERT guarantee uniqueness in text column

MySQL

I have a table with a text column that I need to be unique. Because I can't put a unique constraint on the column in MySQL, I want to know if the following approach would guarantee uniqueness in the column:

Create a varchar column for holding the hash of the unique text column. Put a plain INDEX on it.
Start transaction
Use SELECT FOR UPDATE to see if hash exists
a) Use a subsequent INSERT if hash doesn't exist

b) If hash exists, SELECT by unique text to see if it exists. If unique text does not exist, do a subsequent INSERT, otherwise do 'UPDATE'
End transaction

If I use the above approach for all inserts in that table, can that guarantee uniqueness?

Best Answer

Yes, that should work. However, consider either of these one-liners:

INSERT IGNORE ...

INSERT ... ON DUPLICATE KEY UPDATE ...

If you are using AUTO_INCREMENT, be cautious about 'burning' ids.

Links:

Related Solutions

MySQL duplicate data entry issue

One scenario in which this would fail depends on how busy (or unlucky) you are in timing.

Session A does the select on the value, sees that it should INSERT
Session B does a select on the value before session A does the required insert or update of the value constraint
Session A inserts row, updates value
Session B inserts row, updates value

It's all a matter of timing (as suggested by your 98% works). You could acquire a WRITE LOCK for Session A on the table you are inserting AND on the table you are updating the value, which will block Session B until Session A releases the lock.

Alternatively, you could go ahead and let the DB handle the locking automatically by creating a UNIQUE INDEX and issuing a INSERT .. ON DUPLICATE UPDATE (recommended)

Mysql – UPDATE waits for lock too long after SELECT in MySQL (Innodb)

You are too focused on the details; back off. Let's look at the big picture, the benchmarking, the indexing, the transactions, etc.

How many simultaneous users are you benchmarking for? How many do you expect in reality? How many cores does your CPU(s) have? What version of MySQL are you running?

My points are: (a) The benchmark is stressing the limits, not looking for reality; (b) Oracle has made great strides recently in handling more connections.

When you go beyond the effective connection limit, latency of queries will suffer terribly. So, don't benchmark beyond that. Furthermore, throttle the users so that not "too many" get to MySQL 'simultaneously'. In older versions, it was so bad that (a) throughput would go down as you add more clients, and (of course) (b) latency would go through the roof. Now, throughput plateaus while latency climbs.

For a single item, be sure to wrap Rolando's SQL in a transaction:

BEGIN;
SELECT * FROM pics WHERE id=:id LIMIT 1 FOR UPDATE;
UPDATE pics SET seen=1 WHERE id=:id LIMIT 1;
COMMIT;

For handling more than one id at the same time, you should sort the ids to help avoid deadlocks. Then do them in a single transaction:

BEGIN;
SELECT * FROM pics WHERE id IN ($id_list) FOR UPDATE;
UPDATE pics SET seen=1 WHERE id IN ($id_list) LIMIT 1;
COMMIT;

You can simplify the code more: Do the

UPDATE...
Check rows_affected; exit if 0
SELECT ...

Since rows_affected is local to the 'session', you can discover whether the UPDATE grabbed the row. Note that there is no need for transactions (as far as this code snippet goes), and autocommit=1 would suffice.

Let me point out another issue with the design: Indexing a flag (seen) has two problems (a) The optimizer is unlikely to use the index, due to low cardinality; and (b) the update has to remove a row from that index and add a new row elsewhere; this is costly.

The logic you described does not seem to need INDEX(seen); does something else need it? If not, DROP that INDEX; that may solve the problem.

Best Answer

Related Solutions

MySQL duplicate data entry issue

Mysql – UPDATE waits for lock too long after SELECT in MySQL (Innodb)

Related Question