Mysql – Splitting a table based on enum value

database-designMySQL

I am re-designing our company database, and am wondering whether or not to split this table up into two (or even three) different tables based on the type field. Prospects can become clients, and in rare cases, clients can revert to being prospects again (usually correction of a user error). Currently there are no suppliers who are clients, but theoretically there's nothing to stop that from happening in the future. As you can see, several fields are only needed in one case or another.

CREATE TABLE `companies` (
  /* book-keeping fields every data table should have */
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `created` TIMESTAMP NULL DEFAULT NULL, /* would love to specify DEFAULT CURRENT_TIMESTAMP here but MySQL won't let me */
  `modified` TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  /* fields for all company types */
  `type` ENUM('prospect','client','supplier') CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT 'prospect',
  `name` VARCHAR(25) DEFAULT NULL,
  `abbr_name` VARCHAR(8) DEFAULT NULL,
  `formal_name` VARCHAR(255) DEFAULT NULL,
  `web_site` VARCHAR(255) DEFAULT NULL,
  `email` VARCHAR(320) DEFAULT NULL,
  `logo` VARCHAR(255) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `key` ENUM('false','true') CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT 'false',
  `bust` ENUM('false','true') CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT 'false',
  `head_office` INT UNSIGNED DEFAULT NULL,
  /* only used for clients */
  `accounting_code` VARCHAR(5) CHARACTER SET ascii DEFAULT NULL,
  /* used for prospects, and clients obtained by marketing */
  `marketing_user` INT UNSIGNED DEFAULT NULL,
  `marketing_datasource` CHAR(2) CHARACTER SET ascii DEFAULT NULL,
  /* only used for prospects */
  `marketing_dead` ENUM('false','true') CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT 'false',
  /* only used for suppliers */
  `supplies` SET('materials','plant') CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `account_holding_branch` INT UNSIGNED DEFAULT NULL,
  PRIMARY KEY (`id`),
  FOREIGN KEY (`head_office`) REFERENCES sites (`id`),
  FOREIGN KEY (`marketing_user`) REFERENCES users (`id`),
  FOREIGN KEY (`account_holding_branch`) REFERENCES sites (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

What do people recommend, and importantly, why?

A similar question, Splitting a table based on business logic provides one curt answer with two relevant reasons to help decide whether splitting is worth it:

  • Partition hot and cold data for cache efficiency
  • Many columns almost always null

As to data 'temperature', key clients and key suppliers are hotter than other clients or suppliers, and hotter than prospects. Any company which has gone bust or for which the marketing trail has died are pretty frigid.

As to NULL, it varies, but prospects almost always do not have a logo uploaded, and suppliers presently do not have data in any of the four accounting_code and marketing_* fields, though as I say there's no technical reason why a parts supplier could not also become a client of our business.

Best Answer

If you never or rarely need to access more than one type at a time, then consider splitting. However, personally I wouldn't do so. I've been involved in a number of projects where this has been done, and result in a lot of additional complexity with little benefit.

The companies table appears to be a good choice for the common data. Migrating data when a company changes type will increase complexity. To enable this to work well you will need a table to generate ids from, and rules to ensure the same id doesn't appear in more than one sub-table.

I would consider splitting type specific columns into separate tables. This will require different queries by type. Again this may make the application more complex. As there are only a few smallish columns, keeping them in the main table likely has relatively little overhead and simplify the overall design. If you have many or larger fields which are type specific and only of use when the type is known, then consider splitting the sub-type data into another table. It can share the same key as the relationship will be an optional identity relationship.

Another case where I would consider splitting a record is where there is a mix of tombstone data and highly transactional data. I've seen it work well with parts, and inventory data.

Having worked in a company where suppliers were also clients, revisiting how suppliers and clients are differentiated might be a worthwhile exercise. I would consider splitting suppliers into a separate table based on the following criteria.

  • Systems dealing with suppliers are separate from those dealing with customers.
  • There is little prospect of reusing access routines between the systems.
  • Volumetric data indicates that a least one system could get substantially better performance from splitting the tables.
  • Security/access requirements for the two type of companies are significantly different.

Reviewing what happens to prospect data when a prospect becomes a client may be a worthwhile exercise. If it is retained, it may have value in determining how clients are being generated. It will also make converting clients back to prospects after keying errors simpler.

Partitioning a table like a customer table is likely to be of little use. Hot clients will likely be cached at the database or o/s block buffer level. If you have the ID it is pretty well a direct look-up.

Transactional data is more suitable for partitioning. Order details can age out in hours depending on order processing times. Customer data will likely take years to age out. I don't see any data in the table which could be used to effectively partition the data base on 'hotness'. I have generated an archiving system for customers and their data. Determining the customers which could be archived was not trivial.