Mysql – Architectural Help Needed – MySQL or NoSQL for a table with 54 million records

ArchitectureMySQLperformance

We have a table with 54 million records. Here is the table structure.

CREATE TABLE `metaplay` (
  `track_id` int(11) NOT NULL DEFAULT '0',
  `user_id` int(11) DEFAULT NULL,
  `completed` int(11) DEFAULT NULL,
  `skipped` int(11) DEFAULT NULL,
  `created` int(11) DEFAULT NULL,
  `updated` int(11) DEFAULT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
  KEY `created` (`created`),
  KEY `updated` (`updated`),
  KEY `skipped` (`skipped`),
  KEY `track_id` (`track_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

All these data are numeric. At this point, we have ~300 inserts and ~100 updates per minute. We also pull out daily, weekly and monthly track playing records from this table. Now my question is which of the followings will be better for architectural design and best performance. I would appreciate if you can highlight hardware details as well.

  1. MySQL in SSD Cached Storage with 4GB Ram
  2. MySQL in SSD with 4GB Ram
  3. Any NoSQL Solution
  4. Anything else?

Also do you suggest any mysql specific tuning tips for a table like this?

Best Answer

Ah, what is the problem?

a table with 54 million records

A small table. Nice. I have one here with 8.5 billion rows.

we have !300 inserts and !100 updates per minute

Yeah. Small. I know. I have one here with around 500 million inserts per day. That is 347222.2222222222 per minute.

All that running on stock hardware. Seriously. THough not totally as low end as what you suggest.

But your metric is totally off. 54 million was large 25 years ago. Today it is small, unless someone tries to run a database server on a mini virtual machine.

For example a non-server with 5gb memory - ouch.

Generally - if you need analysis and this is not document style data - stick to MySql, avoid NoSql databases and learn the relational theory so that you know what you do there.

SSD is great, but I am not sure 5gb memory is decent - come on, that is less than a decent workstation has.

For daily / weekly / mnothly aggregations I would make DAILY aggregates, then use those as basis for larger aggregations. A month is only max. 31 days - so most of hte work is done once per day (in an off time). I strongly would reconsider using ebay-level cheap used hardware, though - and that is what the specs look like.