Mysql – Is this db/table structure sensible

database-designMySQL

I'm not very experienced in database design so bear with me.

I've taken over a ruby/rails app. with a connected MySQL/innodb database. As I'm updating the server to the latest ruby/rails version I also wanted to look at the DB structure. My main concern is a specific database design decision which makes it very hacky to use with rails and I wanted to get some external input on this.

We are tracking spatial data for various systems from customers every x minutes which results in a lot of data. We don't have that data in one big "positions" table but split into a table for each system and each year. So e.g. for a system with id=1 and in the current year we have a "positions_1_2015" table.

So for, like, 1,000 systems over 3 years we already have 3,000 tables which seems a lot to me. When I want to change a column I have to change each single table instead of just one. I also have to do some hacking for this mechanism to work with rails activerecord logic.

Right now, summed up, we have about 300,000,000 positions from various customers. They are write once/read often and by multiple concurrent users.

So my question: is this table splitting a sensible design?

The argument from my clients is that they can more easily backup/archive all data and a lot of small tables are more manageable than one single huge table. But above all each client has access to their specific tables so that performance should be better when 100+ customers want to access their spatial data at the same time.

So is this true? Will performance be worse if Ihave 100+ concurrent users on a single big 300M row table?

Thanks for your help!

Edit: SHOW CREATE for one position table

CREATE TABLE `positions_1000_2015` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `world_x` double DEFAULT NULL,
  `world_y` double DEFAULT NULL,
  `timestamp` datetime DEFAULT NULL,
  `created_at` datetime NOT NULL,
  `status` int(11) DEFAULT NULL,
  `coworker_id` int(11) DEFAULT NULL,
  `raw_data` varchar(255) CHARACTER SET latin1 COLLATE latin1_german1_ci    DEFAULT NULL,
  `user_data` varchar(1024) DEFAULT NULL,
  `map_text` varchar(512) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `drive_number` int(11) DEFAULT NULL,
  `course` int(11) DEFAULT NULL,
  `speed` int(11) DEFAULT NULL,
  `max_speed` int(11) DEFAULT NULL,
  `distance` double DEFAULT NULL,
  `gsm_quality` int(11) DEFAULT NULL,
  `sat_count` int(11) DEFAULT NULL,
  `private` int(11) DEFAULT NULL,
  `user_created` tinyint(1) NOT NULL DEFAULT '0',
  `io_01` int(11) DEFAULT NULL,
  `io_02` int(11) DEFAULT NULL,
  `io_03` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `timestamp` (`timestamp`),
  KEY `world_x` (`world_x`),
  KEY `world_y` (`world_y`),
  KEY `status` (`status`),
  KEY `timestamp_status` (`timestamp`,`status`),
  KEY `status_timestamp` (`status`,`timestamp`),
  KEY `idx_raw_data` (`raw_data`),
  KEY `drive_number` (`drive_number`),
  KEY `created_at` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=1412811 DEFAULT CHARSET=utf8

Best Answer

OK - I'll try and answer this question, however you should realise that there is no "right" answer - just assessments based on situation and experience.

This is a classic question about multi-tenancy - shared versus isolated. There's a very good article here from Microsoft.

AIUI, MySQL has a limit of 32,000 databases per server - but if you're running into that limit, you should obtain more server instances long before that happens. Anyway, there are file limits associated with various OS configurations - but I don't have a clue about Amazon's limitations.

I'm more of a believer in isolated schemas with a caveat about reference tables and the like - use a qualified schema.table.name for, say, cities or similar.

Just a note, when I refer to a given post in my answer, please take the time to look at the entire thread - as I said, there are valid differences of opinion on this matter.

From my answer here, "You may have some data duplication issues with a multi-tenant approach, but that's a matter for you to assess in the light of your application/usage/servers/budget..."

One of the better answers that I have seen is here - however it requires monitoring, expertise and effort - you'll get nothing for nothing (except here where the advice is free! :-) )

My own recommendation is that you change your partitioning/sharding scheme to make it by customer (and possibly by year if that suits you), but I would NOT have a database per monitored system - that strikes me as overkill/unproductive/not_necessary/confusing. That way, your number of schemas is either your number of clients (or no. clients * years).

My understanding is that these data are a logging kind of scenario, so therefore, previous years would not be modified very frequently.

Under my scenario, any database modifications would only occur for no._of_clients (or no._of_clients * years) and I believe that it may present a simplification of your application and your job.

Again, I emphasise that there are alternative valid viewpoints that could be argued - this is my 0.02c based on my own experience and knowledge. Another good thread is here.

For GIS tasks, I would definitely prefer PostgreSQL, but it could be that MySQL 5.7 (and greater) will be a big improvement with the adoption of the Boost.Geometry library for GIS functions.