Mysql – Efficient schema to handle time interval queries on a huge table

ArchitecturedatetimeMySQLrailsschema

We have huge table (+10 millions rows) where we aggregate the values by a using a time interval search on a Datetime column. And right now, to build single page on our app, we are querying this table several times, resulting in a high delay on our queries.

This table have one particular property, the records are never updated after the insertion.

I see two solutions for this scenario, but I'm not sure which is better, and also which is recommended by database experts.

  1. Optimize the query at maximum, trying to fetch everything in a single trip.
  2. We can improve our database architecture to reduce the number of records, aggregating the old rows into auxiliar tables.

Example

My data look similar to a currency market, like this one: https://bitcoinity.org/markets. How they allow a quick query over different time intervals (minutes, hours, days, months, years…)?

Is there a well know solution for schemas like this one?

Background info

  • Ruby on Rails App;
  • MySQL
  • Few (or none) query optimization;
  • First rows dated from 2011.

Schema Details

Earnings

CREATE TABLE `earnings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `earner_id` int(11) DEFAULT NULL,
  `sale_id` int(11) DEFAULT NULL,
  `amount` int(11) DEFAULT NULL,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_earnings_on_sale_id` (`sale_id`),
  KEY `index_earnings_on_created_at` (`created_at`)
) ENGINE=Inno

Sample Query

SELECT DISTINCT *, count(amount) total FROM earnings
WHERE (created_at BETWEEN '2015-09-01 07:00:00' AND '2015-10-01 06:59:59')
  AND sale_id IN [....]
GROUP BY earner_id

Besides this query being very simple, it runs a lot of times for different timespan, like by month, or by the last 10 days. Thats why on my second idea (2.) I'm considering an aux table to cache the sums by each timespam. (see this example for the desired data aggregation https://bitcoinity.org/markets)

Best Answer

How they allow a quick query over different time intervals (minutes, hours, days, months, years...)?

The trick is not to do it at runtime at all. If you think this is done by standard SQL "select when needed", you err.

  • Data is generally aggregated when coming in, with the aggregated rows written to separate tables as needed (minute, hour etc.).

  • Charts are prepared from in memory copies of the data, particularly if you show "from current to the last X" bars type of queries. It is simply not efficient to ask the database over and over. You log for regeneration, but you keep the most needed data in memory.

You do not even need to store tick data in the database at all - just aggregates. At least this is what I do - the rare case I need tick data, i go back and parse a binary coded file.

Surely you can use a standard SQL approach, but then expect to pay significantly for inferior performance. Time series aggregation is a very specific scenario.