Mysql – Unique insert performance: Primary Key vs Unique Index

indexindex-tuningmyisamMySQLprimary-key

I have a table of unique values (domains_unique), with column domain varchar(255), with more than 20 mil records.

What's the fastest way to insert into the table, by keeping the domain unique constraint?

I've decided that the query should be :

INSERT IGNORE INTO domains_table (domain) VALUE ('domain.com')

Should I make domain the primary key or should I make it a unique index?

Primary key method:

CREATE TABLE `domains_unique` 
(
     `domain` varchar(255) NOT NULL
     PRIMARY KEY (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE = utf8_general_ci;

Unique index method:

CREATE TABLE `domains_unique` 
(
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `domain` varchar(255) NOT NULL
    PRIMARY KEY (`id`),
    UNIQUE KEY unique_index (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE = utf8_general_ci;

Also how would much would changing the CHARSET and COLLATION to ascii_bin affect performance?

obs:

I would use INSERT DELAYED IGNORE INTO, but I need to know if the specific row (domain) was inserted, that also rules out the bulk insert.

UPDATE:

As promised the benchmarks results:

INSERT 5k unique rows with 4.5k new rows into a 1 mil row table, one row at a time:

  • Primary Key method: 5.7 seconds
  • Unique Index method: 6.3 seconds

To test scaling I've also tested the Unique Index method on a 40mil rows table and it took 45.8 s

Best Answer

From your question, I'm not completely sure if you're inserting multiple values per query, but you definitely should be. With MyISAM as the engine, there should not be a performance difference between the index being UNIQUE or PRIMARY; MyISAM doesn't treat them differently in this case. If you were using InnoDB, however, there would be a difference since it stores the data in primary key order. If you don't need the id column, removing it and making domain the primary key would help performance.

Changing the collation should help since ascii is much simpler than utf8, but you might want to use ascii_general_ci instead of ascii_bin since domain names are case-insensitive.

One other way to do the queries would be to get the number of rows, INSERT DELAYED, flush the delayed writes, and then get the new row count. The difference in the counts would be the same as the affected rows. However, I don't think this would be significantly faster, but it would make the process more complex.