MySQL – Should innodb_file_per_table Be Enabled for Single Tenant SaaS?

configurationinnodbMySQL

We have a mySQL backed SaaS application where each customer has their own mySQL database. So think of it like a wordpress/magento etc. database for each client.

Each database has ~200 table, and we expect to have about 1,000 databases per server, so 200,000+ tables. We currently have innodb_file_per_table enabled.

Our question is regarding whether we should use innodb_file_per_table or not. We've seen some conflicting information on this:

  • Performance: With the large number of tables, is it better or worse to use innodb_file_per_table. We understand there can be issues with the number of file pointers required to have hundreds of thousands of tables open.
  • Stability: There is some discussion that using innodb_file_per_table is safer or improves crash recorvery
  • Concurrency: Some reports say that using innodb_file_per_table improves concurrency because there are individual tablespaces.
  • I presume nothing has come from innodb_file_per_database. Facebook talked about this back in 2009 (!) but I've not seen anything since. https://www.facebook.com/notes/mysql…ent_id=6285127

Welcome any thoughts on this.

Best Answer

Answering one by one:

  • Performance: yes, this is the main problem. Not only in resource for file descriptors, but also for the mysql open table cache and data dictionary, InnoDB checks on start and regenerating table statistics when opened. To be fair, these last ones (except InnoDB checks) will be found no matter the value of innodb_file_per_table, only because a huge number of tables.

    The largest problem, however -in my opinion-, was that in the case of a DROP TABLE with innodb_file_per_table = 1 older versions of 5.5 required a full buffer pool lock, basically stalling the whole server. Some patches applied in the latest versions of MySQL 5.5 and 5.6, plus the buffer pool partitioning (available in 5.5 and enabled by default in 5.6) made that a minor issue.

    On the bright side, you can in 5.6 set a manual path of every table, which may help in some cases, for example, to put the most problematic tables on a separate disk or an SSD, etc.

  • Stability: I wouldn't say that it is more stable, but it is true that if you have disk corruption and a single huge file, you can only pray. Having different files makes easier to "save" all but one or a few. Not convinced that this is a decisive point -InnodB is well regarded about its disk consistency (assuming there is no physical damage).

  • Concurrency: This is true, but only in specific cases. Ext filesystems had a limitation that if you were using O_DIRECT mode, you couldn't write in concurrency to the same file descriptor. That is completely true, but is a very specific case: you could use a different filesystem (XFS is very recommended when handling many files and does not have that problem), you can use the filesystem cache if you have enough ram and in some cases the high concurrency is not a problem if writes go to different files at the same time (thus the innodb_file_per_table=1 or several ibdata files).

  • I cannot see your link, I suppose it is this? Yes, I agree 100% that one thing that it is missing in InnoDB is more control about the tablespace management (funnily enough, it is done for the disk tables in the memory-only engine NDB). I have not seen such a thing implemented. You are supposed to do all optimizations at RAID/LVM/physical level.

I think that you miss one thing that is the number 1 reason for having innodb_file_per_table = 1 - the flexibility for backups, restores, management and moving tables around inidividually (something made even easier with the transportable tablespaces in MySQL 5.6). With it enabled, you will be able to recover disk space when tables are dropped and make defragmentation possible. When you have thousands of tables in a single file, physical backup and restore is impossible, and you have to use logical backups (mysqldump/LOAD DATA).

So what are the alternatives? I have found this problem before, and I have helped solving it on hosts with literally millions of tables in several ways:

  • If disk space is not a problem, you can have some tables in the common tablespace and some separately (innodb_file_per_table is a dynamic variable, although you need to rebuild the table for each one you want to move in or out). This solution is ok if your problem is having some tables that are very dynamic -being created and destroyed frequently- while other are static, but you may end up with a very large common tablespace. However, if using innodb_file_per_table = 0 works better with you, then end of discussion.
  • You can have several common tablespace files (ibdata1;ibdata2;ibdata3). This is ok if you want to separate them in several disks, or you do not want to deal with huge files, but it will not solve the "having many tables" problem.
  • Change your schema- For example, wordpress multisite installation puts many tables together, and if I remember correctly (I may be wrong), it only creates a new table per site, not a full database. If your tables are going to be very small, this is the way to go- modify the structure so that you have larger and less tables
  • Choose a different engine - Wordpress.com uses MyISAM at scale and they do not care about corruption because they have a good backup/recovery system. Maybe this is your case if most of your tables are read-only (or almost read-only) or you put a good caching system in-between.
  • What is the number 1 way to deal with this problem? What I have been doing on behalf of my clients most of the time is what Wordpress, Facebook and other do- sharding. Do not try to handle a million of users in a single powerful machine- use smaller ones and divide load (and databases and users) between several servers. You do not have to handle the sharding all by yourself, there are tools to help you do that, like the Upstream MySQL Fabric (for Java and Python) or the engine Spider.

    Please note that if the bottleneck is not on CPU/disk/memory, you can even use virtual machines in order to consolidate services.

In general, innodb_file_per_table is something that you want ON, for the flexibility reasons that I told you before. If you cannot have it on (test it first, maybe you would be surprised! -avoid premature optimization), you can disable it for all or some of the tables (but the change will be more painful than anything else. But be prepared for the consequences. Try to avoid that with some of the recommendations that I told you at the end.

Update: 5.7 labs releases have announced that may finally become obsolete and we will finally have better tablespace management thanks to a CREATE TABLESPACE syntax (search for "InnoDB: General Tablespaces"). With this, you will be free to create your own "innodb_file_per_database" or any other schema you want.