Mysql – Split data across databases/machines

data synchronizationdatabase-designMySQLremote

I have two questions about how to partition data across servers.

  1. Can I create and connect two or more databases, with the records correctly associated, using only a mapping table stored in a third (or fourth), separate database to link them? For example, suppose I wanted to store a family tree with info such as date of birth, mother's maiden name, citizenship/immigration status, ethnic/religious affiliation, and so on for each member, but I wanted to partition the info across databases (and machines) to reduce the amount of personally identifiable information (PII) exposure that might occur through a single breach. I want to link a mapping table (stored in its own separate database) with another small database that includes tables of DOB and ethnic/religious affiliation and a few other details, and with a third database that contains the rest of the sensitive information (e.g., mother's maiden name, citizenship/immigration status, and place of birth). Bottom line: if I insert or pull a record, I want to insert/acquire ALL information about a single entity, but as noted above, I don't want to store all of the info in one database. Is it possible to store each database independently, with nothing in common, using only a mapping table from a separate database to link them? If not, what is the minimum that I must include in each database to achieve what I describe?

  2. How would I determine the increase in time/space complexity due to the use of remote databases/multiple machines? The database system I'm contemplating is MySQL, but I'm assuming the answer would tend to be the same across the board.

EDIT: Because the point is to avoid storing all data in any single database/machine, database replication is not what I'm looking for. I want to actually partition the data across servers. Thanks so much for your help!

P.S. Please also note that this measure is not replacing encryption or any combination of security measures — all of which are also being applied. This is in addition to those protective measures. We have multiple recent examples of millions of people losing entire sets of PII through the breach of a single database.

Best Answer

What you describe sounds much like horizontal sharding of sorts. That, of course, can be done. To answer your second question, you determine the performance (time/space) impact of your design by building a prototype system and measuring what's important to you.

However, security by obscurity, which seems to be your ultimate goal, is rarely worth the effort.

There are better ways to protect sensitive data, which include encryption (e.g. storage level or transparent database encryption, which obviously should include encrypting backups); row- and column-level access control at the database level; thorough security policies; audit, including possibly wire-level analysis of database traffic; network hardening and intrusion detection; etc. All these methods work well only in combination; looking only at one aspect of your security landscape is counterproductive.