Mysql – Does innodb guarantee unique data in unique indexes even with UNIQUE_CHECKS=0

indexMySQLunique-constraint

We have a suspicion that sometimes there could be non-unique data even with a UNIQUE index. Question is, can we rely on InnoDB guaranteeing that there won't ever be non-unique data, even in the presence of UNIQUE_CHECKS=0, so there is no reason to check it?

Edited to add: If it is possible to have non-unique index entries for a UNIQUE index in InnoDB (even with cheating), how is such a test-case created?

Or is it like foreign key constraints and InnoDB, where that guarantee can be broken by setting FOREIGN_KEY_CHECKS=0, so that if we want to be sure, we should check it with SELECT statements?

The documentation for UNIQUE_CHECKS:

Setting this variable to 0 does not require storage engines to ignore duplicate keys. An engine is still permitted to check for them and issue duplicate-key errors if it detects them.

documentation for Bulk Data Loading for InnoDB Tables:

If you have UNIQUE constraints on secondary keys, you can speed up table imports by temporarily turning off the uniqueness checks during the import session:
SET unique_checks=0;
... SQL import statements ...
SET unique_checks=1;
For big tables, this saves a lot of disk I/O because InnoDB can use its change buffer to write secondary index records in a batch. Be certain that the data contains no duplicate keys.

These seem to imply that UNIQUE_CHECKS=0 can be used to break the uniqueness guarantee.

MariaDB Enterprise documentation for unique_checks:

Set to 0 to speed up imports of large tables to InnoDB. The storage engine will still issue a duplicate key error if it detects one, even if set to 0.

This seems to imply that the guarantee holds, even with UNIQUE_CHECKS=0.

This example SQL demonstrates UNIQUE_CHECKS=0 is ignored and uniqueness seems to be enforced still:

DROP TABLE IF EXISTS `FOO`;

CREATE TABLE `FOO` (
    `foo` VARCHAR(255) NOT NULL,
    UNIQUE INDEX (`foo`)
) ENGINE=InnoDB;

INSERT INTO FOO (foo) VALUES ('A');

SET UNIQUE_CHECKS=0;
-- This generates the error:
-- ERROR 1062 (23000) at line 11: Duplicate entry 'A' for key 'foo'
-- even though UNIQUE_CHECKS=0
INSERT INTO FOO (foo) VALUES ('A');

(Edit: Simplified example to only demonstrate that UNIQUE_CHECKS=0 is ignored.)

So which is it: Can we rely on InnoDB guaranteeing that there won't ever be non-unique data, even in the presence of UNIQUE_CHECKS=0, so there is no reason to check it?

We're using MySQL version 5.7.23.

Best Answer

Normally, index changes are put into the "change buffer" for later writing to disk. This happens for non-unique indexes. (This also applies to UPDATEs that change a UNIQUE index, and DELETEs, which necessarily remove a 'row' from the index.)

For UNIQUE indexes (and UNIQUE_CHECKS=1), each row being inserted is immediately verified as not being a dup. That way, the row can be rejected quickly. The drawback is that it must fetch the index block that contains the value in question. For huge tables, this may involve a read from disk, which can be relatively slow. And that disk read is blocking, so the INSERT must wait.

For a batch load of rows, it would be faster to load all the rows, populate the change buffer, and postpone having to actually look at the disk index.

UNIQUE_CHECKS was implemented in the days before fast SSDs and when RAM was small. That is, it used to be an important optimization.

The "Change Buffer" is a portion of the buffer_pool (default: 25%). It holds secondary index changes awaiting consolidation and insertion into the disk copy of the index.

When an index lookup is needed, it must check both the change buffer and the real index.

The real index is a bunch of 16KB blocks in a B+Tree organization based on the secondary index. Those blocks are cached (one-by-one, as needed) in the other 75% of the buffer_pool.

With UNIQUE_CHECKS=ON, the uniqueness test can be immediate; with it off, I suspect that the test is not done until either the bulk INSERT is finished, or the change buffer needs some flushing.

The change buffer is written to disk in the background. Note: This is one explanation for I/O when no queries are running. Logically, the CB is sorted; a block from disk is read; entries are merged into that block; then the block is left in the buffer_pool as "dirty". Eventually, another thread worries about writing dirty blocks to disk (another cause for strange I/O).

Related Solutions

SQL Server Index Key – Choosing Index Key for Foreign Key Reference

The (lack of) documentation suggests that this behaviour is an implementation detail, and is therefore undefined and subject to change at any time.

This is in stark contrast to CREATE FULLTEXT INDEX, where you have to specify the name of an index to attach to -- AFAIK, there is no undocumented FOREIGN KEY syntax to do the equivalent (though theoretically, there could be in the future).

As mentioned, it does make sense that SQL Server chooses the smallest physical index with which to associate the foreign key. If you change the script to create the unique constraint as CLUSTERED, the script "works" on 2008 R2. But that behaviour is still undefined and should not be relied upon.

As with most legacy applications, you'll just have to get down to the nitty-gritty and clean things up.

Does MySQL Guarantee Atomic Upgrade from Read Lock to Write Lock?

Indeed, the part of the manual you quoted does just what it says:

-- Connection #1
LOCK TABLE mytable READ; -- lock is acquired

-- Connection #2
LOCK TABLE mytable WRITE; -- execution is suspended, lock is NOT acquired

-- Connection #1
LOCK TABLE mytable WRITE; -- execution is suspended
-- READ lock was released, allowing Connection #2 to acquire the WRITE lock
-- Connection #2 resumes

Solution: do switch to InnoDB and use transactions. Performance of the InnoDB engine improved a lot in the recent versions.

-- Connection #1
BEGIN;
SELECT * FROM mytable LOCK IN SHARE MODE; -- lock is acquired

-- Connection #2
BEGIN;
SELECT * FROM mytable FOR UPDATE; -- execution is suspended, lock is NOT acquired

-- Connection #1
SELECT * FROM mytable FOR UPDATE;
-- deadlock is detected, Connection #2 is rolled back

Best Answer

Related Solutions

SQL Server Index Key – Choosing Index Key for Foreign Key Reference

Does MySQL Guarantee Atomic Upgrade from Read Lock to Write Lock?

Related Question