Mysql – Does innodb guarantee unique data in unique indexes even with UNIQUE_CHECKS=0

indexMySQLunique-constraint

We have a suspicion that sometimes there could be non-unique data even with a UNIQUE index. Question is, can we rely on InnoDB guaranteeing that there won't ever be non-unique data, even in the presence of UNIQUE_CHECKS=0, so there is no reason to check it?

Edited to add: If it is possible to have non-unique index entries for a UNIQUE index in InnoDB (even with cheating), how is such a test-case created?

Or is it like foreign key constraints and InnoDB, where that guarantee can be broken by setting FOREIGN_KEY_CHECKS=0, so that if we want to be sure, we should check it with SELECT statements?

The documentation for UNIQUE_CHECKS:

Setting this variable to 0 does not require storage engines to ignore duplicate keys. An engine is still permitted to check for them and issue duplicate-key errors if it detects them.

documentation for Bulk Data Loading for InnoDB Tables:

If you have UNIQUE constraints on secondary keys, you can speed up table imports by temporarily turning off the uniqueness checks during the import session:

SET unique_checks=0;
... SQL import statements ...
SET unique_checks=1;

For big tables, this saves a lot of disk I/O because InnoDB can use its change buffer to write secondary index records in a batch. Be certain that the data contains no duplicate keys.

These seem to imply that UNIQUE_CHECKS=0 can be used to break the uniqueness guarantee.

MariaDB Enterprise documentation for unique_checks:

Set to 0 to speed up imports of large tables to InnoDB. The storage engine will still issue a duplicate key error if it detects one, even if set to 0.

This seems to imply that the guarantee holds, even with UNIQUE_CHECKS=0.

This example SQL demonstrates UNIQUE_CHECKS=0 is ignored and uniqueness seems to be enforced still:

DROP TABLE IF EXISTS `FOO`;

CREATE TABLE `FOO` (
    `foo` VARCHAR(255) NOT NULL,
    UNIQUE INDEX (`foo`)
) ENGINE=InnoDB;

INSERT INTO FOO (foo) VALUES ('A');

SET UNIQUE_CHECKS=0;
-- This generates the error:
-- ERROR 1062 (23000) at line 11: Duplicate entry 'A' for key 'foo'
-- even though UNIQUE_CHECKS=0
INSERT INTO FOO (foo) VALUES ('A');

(Edit: Simplified example to only demonstrate that UNIQUE_CHECKS=0 is ignored.)

So which is it: Can we rely on InnoDB guaranteeing that there won't ever be non-unique data, even in the presence of UNIQUE_CHECKS=0, so there is no reason to check it?

We're using MySQL version 5.7.23.

Best Answer

Normally, index changes are put into the "change buffer" for later writing to disk. This happens for non-unique indexes. (This also applies to UPDATEs that change a UNIQUE index, and DELETEs, which necessarily remove a 'row' from the index.)

For UNIQUE indexes (and UNIQUE_CHECKS=1), each row being inserted is immediately verified as not being a dup. That way, the row can be rejected quickly. The drawback is that it must fetch the index block that contains the value in question. For huge tables, this may involve a read from disk, which can be relatively slow. And that disk read is blocking, so the INSERT must wait.

For a batch load of rows, it would be faster to load all the rows, populate the change buffer, and postpone having to actually look at the disk index.

UNIQUE_CHECKS was implemented in the days before fast SSDs and when RAM was small. That is, it used to be an important optimization.

The "Change Buffer" is a portion of the buffer_pool (default: 25%). It holds secondary index changes awaiting consolidation and insertion into the disk copy of the index.

When an index lookup is needed, it must check both the change buffer and the real index.

The real index is a bunch of 16KB blocks in a B+Tree organization based on the secondary index. Those blocks are cached (one-by-one, as needed) in the other 75% of the buffer_pool.

With UNIQUE_CHECKS=ON, the uniqueness test can be immediate; with it off, I suspect that the test is not done until either the bulk INSERT is finished, or the change buffer needs some flushing.

The change buffer is written to disk in the background. Note: This is one explanation for I/O when no queries are running. Logically, the CB is sorted; a block from disk is read; entries are merged into that block; then the block is left in the buffer_pool as "dirty". Eventually, another thread worries about writing dirty blocks to disk (another cause for strange I/O).