Mysql – Performance issues on a MySQL database with 150 million rows

mariadbMySQLperformanceperformance-tuning

Current setup:

  • A managed vps server
  • InnoDB storage for big tables
  • 10.0.36-MariaDB
  • 18gb ram
  • 8 x 2.40 GHz CPU.

I have a pretty big MariaDB database table which is used for Business intelligence analytics data. The current table data fluctuates from 150-180 million rows depending on data that is being imported.

The database is filled through a java file that was custom written, and the data is being read through a custom PHP api, through a method that uses the table indexes. The data is date separated and currently the table holds around 2-3 years of data.

My problem is that whenever we see an error with the data we usually need to re-import large volumes of data going back a long time. Meaning tens of thousands of report files that each is being consumed by the java file and imported one by one.

I've done some speed optimisation to the import process, and a single report doesn't take long, unless the report is 300mb or so – then it takes around 10-15 minutes depending on the busyness of the database at that moment.

The daily report import usually has 200-400 files and it takes an hour in the early morning when no-ones in the office. It's all ran by cronjobs and custom report downloading scripts from various sources.

The daily imports and data fetches are pretty fast – however when one wants to make let's say a years data fetch where every day would be visible in the graph for one metric – it can take 10-15 minutes or longer if multiple people are trying to do the same, or if something big is being imported

The data that is being fetched is not a single query, I separate every day as a separate request – that sped up the fetches 10 times or more, depending on how long the time period is.

In a perfect world I'd like to optimize the structure or any settings I could so I could import the data faster and fetch longer timeframe data faster. Our current approach is to have a single database separated for analytics data, with a couple of helper tables.

What kind of optimisation could I do for this amount of data, knowing that we grow around 5-8 million rows per month (that could increase)?

Best Answer

  • INSERTing one row at a time can be sped up by gathering 100 rows and inserting them in a single batch: INSERT INTO t (a,b,c) VALUES (1,2,3), (11,22,33), ... This may speed up things by a factor of 10.
  • Check for errors after each batch. If the batch fails, write it to a file for fixing and re-importing. (This should improve the error-handling pass.)
  • Assuming the data is only "new" data, it should be easy to summarize the data for each day and write this to a new table. This may speed up fetching a years's data for graphing by 10x.
  • The daily 200-400 files might be handled by multiple processes. Suggest not breaking it into more processes than you have CPUs.

Putting all those together might shrink the elapsed time from hours to minutes.

If you would like to discuss things further, please provide SHOW CREATE TABLE, and maybe some more details on the data and its processing. What's the value of innodb_buffer_pool_size?