MySQL Optimization – Schema Review, Normalization, and Query Optimization

MySQLnormalizationschema

So I have a relatively complex database I'm designing – basically full in-house accounting setup. In one section I have some concern that I'm over-normalizing; the 'directory' – I'll example the employee section but there will be the same basic thing for customers, suppliers etc.

Basically directory stores information that is relevant to all types whether customer or employee or x, employees can have any number of addresses, email addresses etc.

Using MySQL 5.6.12, PHP/PDO (not that what I'm accessing it from really matters for the question)

The relevant table creation calls:

CREATE TABLE IF NOT EXISTS `directory` (
  `directory_id` int(11) NOT NULL AUTO_INCREMENT,
  `name_display` varchar(255) NOT NULL,
  `type` tinyint(2) unsigned NOT NULL COMMENT '0: human, 1: unincorporated, 2: incorporated',
  `visibility` varchar(255) NOT NULL,
  `notes` text NOT NULL,
  PRIMARY KEY (`directory_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `directory_addresses` (
  `directory_id` int(11) NOT NULL,
  `address_id` int(11) NOT NULL AUTO_INCREMENT,
  `type` int(11) NOT NULL,
  `country` int(11) NOT NULL,
  `region` int(11) NOT NULL,
  `city` varchar(255) NOT NULL,
  `postalcode` varchar(10) NOT NULL,
  `line1` varchar(255) NOT NULL,
  `line2` varchar(255) NOT NULL,
  `version` int(11) NOT NULL,
  `primary` tinyint(1) NOT NULL,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`address_id`),
  KEY `fk_type` (`type`),
  KEY `fk_country` (`country`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;


 CREATE TABLE IF NOT EXISTS `directory_email` (
  `directory_id` int(11) NOT NULL,
  `email_id` int(11) NOT NULL AUTO_INCREMENT,
  `primary` tinyint(1) NOT NULL,
  `email` varchar(255) NOT NULL,
  PRIMARY KEY (`email_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `directory_employee` (
  `directory_id` int(11) NOT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `employee_id` varchar(10) NOT NULL,
  `start_date` date NOT NULL,
  `end_date` date DEFAULT NULL,
  `department` int(11) NOT NULL,
  `notes` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `directory_employee_payrate` (
  `employee_id` int(11) NOT NULL,
  `payrate_id` int(11) NOT NULL AUTO_INCREMENT,
  `start_date` date NOT NULL,
  `end_date` date DEFAULT NULL,
  `rate` decimal(10,2) NOT NULL,
  PRIMARY KEY (`payrate_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin

Which for the employee list overview page results in a query of:

Select
    directory_employee.directory_id,
    directory_employee.employee_id,
    directory_employee.department,
    directory_employee.start_date,
    directory_employee.end_date,
    directory_employee.notes,
    directory_employee_payrate.rate,
    directory.name_display,
    directory_email.email,
    directory_phone.phone,
    directory_addresses.line1,
    directory_addresses.line2,
    directory_addresses.city,
    directory_addresses.region,
    directory_addresses.postalcode,
    directory_addresses.country
FROM directory_employee
LEFT JOIN directory_employee_payrate ON
    directory_employee_payrate.employee_id = directory_employee.id
    AND directory_employee_payrate.end_date IS NULL
LEFT JOIN directory ON
    directory.directory_id = directory_employee.directory_id
LEFT JOIN directory_addresses ON
    directory_addresses.directory_id = directory_employee.directory_id
    AND directory_addresses.primary = 1
LEFT JOIN directory_email ON
    directory_email.directory_id = directory_employee.directory_id
    AND directory_email.primary = 1
LEFT JOIN directory_phone ON
    directory_phone.directory_id = directory_employee.directory_id
    AND directory_phone.primary = 1

Which is admittedly somewhat LONG!

So my question(s) basically boils down to:

  • Does that look like a reasonable schema?
  • Is it over-normalized? (If so what would you recommend doing instead – serialized arrays?)
  • If that is a reasonable schema, are there any optimizations that would be a good
    idea to apply to that Select statement?
  • Any schema improvements that you'd recommend outside of the main point of this question? (I do have FK – I just haven't set them all up/copied the definition statements.)

Best Answer

For the most part, it's a good schema design for the internals of an accounting-type system. Normalization gives you a lot of flexibility in this sort of scenario, and what you've done seems appropriate.

That said, I did question the choice of the directory table. There are cases in which something like that can be appropriate, but in those cases, there is generally some sort of common thread - a lot of common fields, or a frequent need to aggregate data across all of the types (i.e., multiple types of customer that all place orders through a common system), for example. If I was to go only by the small excerpt you show here, I would say it was over-normalized. However, you also mention that you have similar structures for customers, suppliers, and the like, and given that it's an accounting system, I would guess that you would be linking all of this to a general ledger schema of some sort. If that analysis is correct, your directory table becomes a common link between all of the various types of payees in your system, which is necessary, and is therefore good design.

(I apologize for the rambling answer, but normalization is an art. I was hoping to relay my thought process in the answer.)

You also asked about other optimizations. The most important thing you can do to help the relational engine in any RDBMS is to include good indexes. It looks like you have a good start, maybe more - you do have primary keys, which I assume are accompanied by indexes. You also mention foreign keys, which can help at least some engines optimize the queries. My specialty is not MySQL so I can't speak directly to the technical points of optimizing for its engine, but in general, you will probably be fine. Just pay attention to your queries' performance, check your execution plans/explain plans, and add additional indexes as necessary.