Mysql – managing a growing database in the long run (10 years from now)

MySQL

I'm close to finishing-up a WEB app (PHP + mysql website portal), which will start slowly, as in uploading few pictures and inserting few rows in main database table, but soon, and over few years, I expect it to grow steadily and even logaritmically at some point.

By my Excel calculations, at some point, the database table data + indexes (innoDB) will overgrow the size of hosted pictures by a factor of 10, but in the beginning it will be the other way round.

So basically my portal hosts pictures (medium size jpegs let's say) and uses MySql InnoDB database tables.

My question to you all admins and professionals is about my MySql Database.

I'm thinking in advance so I don't get caught unprepared :

When my database will grow more and even get 1.2TB or 2TB or more….how can I deal with that ? Can I say split it on 2 or more dedicated machines so searches will still be fast and getting data from this big database ? Or shall I do 1 server and pump it up with a lot of cheap 7200 rpm sata ? or what can be the solution to a growing database.

My first configuration for this web-app is a VPS – SSD hosting with a 4ghz processor, 3GB of ram and 2×60 GB SSD raid 1.

thanks for your answers,

Best Answer

Here's some generic suggestions...

  • Don't store the images themselves inside the database (it's unclear if that's your plan). This will cause you huge scaling problems. Store them in a filesystem, or a non-relational database if you must.
  • The database should be a separate server from the app.
  • Ideally you should have enough RAM to hold your working data set. Make sure you tune the database in advance (innodb_buffer_pool_size, innodb_log_file_size, etc).
  • Put your mysql data on the fastest disks you can get. SSD or 10k+ SATA/SAS. Make sure the RAID controller has a battery backup and write caching.
  • RAID-10 for your mysql data volume.
  • Put mysql's binlogs (and possibly relay logs, and innodb transaction logs) on a separate RAID array, RAID-1 should be fine.
  • Have a slave server for read-only queries, to take some load from the master. Your app will have to differentiate RW and RO queries. Set 'read_only=1' on that slave to prevent unfortunate accidents.

  • As your dataset grows, you may need to consider horizontal sharding across multiple servers. MySQL doesn't provide a mechanism for this, so your app will need to.

  • There's no reason you can't have a single 2TB database, but you will probably find performance unacceptable for data that doesn't fit in the buffer pool.
  • Don't go overboard on indexing. Additional indexes beyond the primary introduce additional overhead (not to say you should have none). There may be good opportunities to use a composite index (multiple columns) instead of 2 separate ones.

Hope this helps,