MySQL allows an end user to temporarily disable foreign key checks with the following statement
SET FOREIGN_KEY_CHECKS=0;
This means it's incredibly common (at least in my corner of the world) to find production MySQL databases where some rows have invalid foreign keys. For example, if you setup two tables with a foreign key relationship
CREATE TABLE `foo` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
CREATE TABLE `bar` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`foo_id` int(11) DEFAULT NULL,
`name` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `foo_id` (`foo_id`),
CONSTRAINT `bar_ibfk_1` FOREIGN KEY (`foo_id`) REFERENCES `foo` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8;
and insert some legitimate data
insert into foo(id,name) VALUES (1,'Test 1');
insert into foo(id,name) VALUES (2,'Test 2');
insert into bar(name, foo_id) VALUES ("Hello", 1);
and then use the above statements to turn off foreign key checks and set some bogus data
SET FOREIGN_KEY_CHECKS=0;
insert into bar(name, foo_id) VALUES ("Hello", 100);
SET FOREIGN_KEY_CHECKS=1;
You now have a bar
table with a foo_id
of 100
, but no such row in the foo
table.
My Question: Does MySQL have any built-in tools for finding this sort of bogus data? If not, are there common DBA tools for finding this sort of bogus data? Or am I writing a program to manually scan every table for foreign_keys, and then check each row individually?
Best Answer
The way to check for "incorrect" foreign keys, you have to do something like:
or its equivalent
There is a tool, written in Ruby, called mysql-foreign-key-checker, by Michael Irey that can do this kind of checks systematically for you. The program is actually quite simple and easy to understand, and could be translated quite easyly to the language of your choice.
Check the queries at SQLFiddle.
NOTE: I know that some
IS NOT NULL
shouldn't be necessary in the queries, because you already haveNOT NULL
constraints. But... if you're checking for data that actually violates some constraints... you'd better not rely on any constraint.