Mysql – Unique Index/Constraint with multiple columns, one column is nullable

indexMySQLnull

My questions is whether it's possible to have unique index comprised of multiple columns where one of the columns may contain NULL.

Example: A table named 'regulated_person' has a number of columns including LAST_NAME, FIRST_NAME, BIRTH_DATE and ALIAS. The data type for BIRTH_DATE is date, for the others it's String/VARCHAR. LAST_NAME, FIRST_NAME and BIRTH_DATE are not nullable, i.e. they require values. ALIAS serves as a tie-breaker for cases where two or more people have the same first/last names and are born the same day. Because the tie-breaking situations are not common I want to avoid the need to provide an ALIAS value unless it's necessary.

By way of background the primary key for the table is handled by an auto-increment generator. The purpose for the unique index is to provide a 'business key' that will identify the row without having to resort to the primary key. I'm working with MySQL 5.1.35 and Hibernate ORM ver. 4.3.10.

Any suggestions as to server/database providers apart from MySQL would be welcome.

Thanks in advance for any guidance, and also apologies in advance for any unintended violations of 'forum protocol' as this is the first time I've submitted a question ANYWHERE.

Best Answer

Since the alias column is nullable, if you add a unique constraint on the composite (last_name, first_name, birth_date, alias), there will still be duplicates allowed, with the same values in the first 3 columns and NULL in the alias. The constraint is skipped / accepted when at least one value is null. MySQL documentation on CREATE TABLE is not very clear but you can test the behaviour:

A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. For all engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL.

What you could do is to define the alias as not null and add a default value, (say 'NONE' or 'DEFAULT' or the empty string ''). You (or the user) will not have to provide that value, it will be automatically saved in all rows. Once someone tries to add a new row with same last name, first name and birth date as an existing row, the unique constraint will forbid it. I guess you could add some procedure at that point, that asks for a different value for the alias and adds the new row with it.

Setup

CREATE TABLE dbo.Customers 
(
   CustomerID int NOT NULL PRIMARY KEY,
   FirstName nvarchar(50),
   LastName nvarchar(50),
   [Address] nvarchar(200),
   Email nvarchar(260)
);

CREATE NONCLUSTERED INDEX 
    IX_Customers_CustomerIDEmail 
ON dbo.Customers
(
   CustomerID,
   Email
);

-- Pretend we have some rows
UPDATE STATISTICS dbo.Customers 
WITH ROWCOUNT = 100000, PAGECOUNT = 20000;

Per-index update plan (non-unique index)

UPDATE dbo.Customers 
SET Email = N'New', [Address] = 'New Address'
WHERE Email = N'Old' 
OPTION (QUERYTRACEON 8790); -- Per-index update plan

Execution plan:

The optimizer often makes a cost-based decision between updating nonclustered indexes per-row (a 'narrow' plan) or per-index (a 'wide' plan). The default strategy (except for in-memory OLTP tables) is a wide plan.

Narrow plans (where nonclustered indexes are maintained at the same time as the heap/clustered index) are a performance optimization for small updates. This optimization is not implemented for all cases - using certain features (like indexed views) means that the associated index(es) will be maintained in a wide plan.

More information: Optimizing T-SQL Queries that Change Data

In this case, I have used undocumented trace flag 8790 to force a wide update plan: The plan therefore shows the clustered and nonclustered indexes being maintained separately.

The Split turns each update into a separate delete & insert pair; the Filter filters out any rows that would not result in a change to the index.

More information: (Non-updating updates) by the SQL Server QO Team.

Per-index update plan (unique index)

-- Same index, but unique
CREATE UNIQUE INDEX IX_Customers_CustomerIDEmail ON Customers
(
   CustomerID,
   Email
)
WITH (DROP_EXISTING = ON);

UPDATE dbo.Customers 
SET Email = N'New', [Address] = 'New Address'
WHERE Email = N'Old' 
OPTION (QUERYTRACEON 8790); -- Per-index update plan

Execution plan:

Notice the extra Sort and Collapse operators when the index is marked unique.

This Split-Sort-Collapse pattern is required when updating the keys of a unique index, to prevent intermediate unique key violations.

More information: Maintaining Unique Indexes by Craig Freedman

The Sort in particular can be a problem. Not only is it an unnecessary extra cost, it may spill to disk if estimates are inaccurate.

About nonclustered keys

Another factor to consider is that nonclustered index structures are always unique, at every level of the index, even if UNIQUE is not specified. The clustering key(s) - and possibly a uniquifier if the clustered index is not marked unique - are added to a non-unique nonclustered index at all levels.

As a consequence, the following index definiton:

CREATE INDEX IX_Customers_CustomerIDEmail ON Customers
(
   Email
)
WITH (DROP_EXISTING = ON);

...actually contains the keys (Email, CustomerID) at all levels. It is therefore 'seekable' on both columns:

SELECT * 
FROM dbo.Customers AS C WITH (INDEX(IX_Customers_CustomerIDEmail))
WHERE C.Email = N'Email'
AND C.CustomerID = 1;

More information: More About Nonclustered Index Keys by Kalen Delaney

Best Answer

Related Solutions

Thesql UNIQUE KEY multiple columns with one column as NULL

Sql-server – Should I mark a composite index as unique if it contains the primary key

Setup

Per-index update plan (non-unique index)

Per-index update plan (unique index)

About nonclustered keys

Related Question