Mysql – Check against multiple duplicate records

duplicationMySQL

I have data that i should insert into table.
But before inserting i need to check against duplicate records and report list of those records.

Table:

CREATE TABLE `test` (
  `A` varchar(19) NOT NULL,
  `B` varchar(9) NOT NULL,
  KEY `A` (`A`),
  KEY `B` (`B`)
) ENGINE=InnoDB;

I need to check for both columns:

Number of records to insert: ~1000

Rows in table: ~1.000.000

What is the efficient way of doing this.

Thanks in advance.

Best Answer

This would depend on the table's layout.

Suppose you have the following table

CREATE TABLE `mydata` ( 
    `A` varchar(19) NOT NULL, 
    `B` varchar(9) NOT NULL, 
    KEY `A` (`A`), 
    KEY `B` (`B`) 
) ENGINE=InnoDB;

Before you insert 1000 rows into mydata, you could do preload them into another table called mynewdata like this:

CREATE TABLE mynewdata LIKE mydata;
CREATE TABLE mynewdups LIKE mydata;
INSERT INTO mynewdata ... ;
INSERT INTO mynewdups SELECT * FROM mynewdata;

Next delete all rows in mynewdata that matches A or B in mydata

DELETE T1.* FROM mynewdata T1 INNER JOIN mydata T2 ON T1.A=T2.A OR T1.B=T2.B;

What's left in mydata are rows that do not have A or B matching

What about the rows that matched? Run this

DELETE T1.* FROM mynewdups T1 LEFT JOIN mydata T2
ON T1.A=T2.A OR T1.B=T2.B
WHERE T2.A IS NOT NULL;

What's left in mynewdata is data to import

What's left in mynewdups is data that had a dup key in mydata

Give it a Try !!!

Related Solutions

Mysql – Need to find duplicate entries

Suppose your table is called ingredients. Try the following:

Step 01) Create an empty delete keys table called ingredients_delete_keys

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;

Step 02) Create PRIMARY KEY on ingredients_delete_keys

ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);

Step 03) Index the ingredients table with fk,recipe,pkey

ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);

Step 04) Populate the ingredients_delete_keys table

INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;

Step 05) Perform a DELETE JOIN on ingredients table using keys that don't match

DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;

Step 06) Drop the delete keys

DROP TABLE ingredients_delete_keys;

Step 07) Get rid of the fk_recipe_pkey_ndx index

ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

OK Here are all the lines in one block...

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
DROP TABLE ingredients_delete_keys;
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

Give it a Try !!!

CAVEAT

Notice that using MIN function helps keep the first pkey entered for fk. If you switch it to MAX function instead, the last pkey entered for fk is kept.

Mysql: Insert performance INNODB vs MYISAM

The reason is very simple. When you insert a row into MyISAM, it just puts it into the server's memory and hopes that the server will flush it to disk at some point in the future. Good luck if the server crashes.

When you insert a row into InnoDB it syncs the transaction durably to disk, and that requires it to wait for the disk to spin. Do the math on your system and see how long that takes.

You can improve this by relaxing innodb_flush_log_at_trx_commit or by batching rows within a transaction instead of doing one transaction per row.

I highly recommend reading High Performance MySQL 3rd Edition (I am the author).

Best Answer

Related Solutions

Mysql – Need to find duplicate entries

Mysql: Insert performance INNODB vs MYISAM

Related Question