Mysql – Perl – MySQL/MariaDB – slow with no identifiable bottleneck

mariadbMySQLperformanceperl

I am running a Perl script (using DBI) which reads from raw files from a hard disk, and updates MySQL database (which is on a separate SSD). My performance is rather slow (1000 files processed in 30-60 seconds), but I can not find the bottleneck.

CPU. Network, Disk and Memory are all rather unused. I am running Windows 7 64bit on an i7 machine (8 cores). MariaDB is version 10.0.10. My database is around 78G in size with 5M entries, all tables properly indexed.

Perfmon confirms this showing total CPU 6%, Network 0%, HD Disk 140 I/O/s, Memory 10%. None of the CPU cores are used more than 3-4%.

I have experimented with changing all these mysql variables with no success:
innodb_use_global_flush_log_at_trx_commit
innodb_buffer_pool_instances
global.max_connections
innodb_thread_sleep_delay
global.innodb_io_capacity
global.innodb_sync_spin_loops
innodb_flush_log_at_trx_commit
wait_timeout

Mysqltuner did not report anything of interest, except for:
Data in InnoDB tables: 78G (Tables: 12)
Total fragmented tables: 10
Query cache is disabled
Thread cache is disabled
InnoDB data size/buffer pool: 78.1G/2.0G

Perl profiling showed that majority of time is taken up by DBI::st::execute (invoking sql).

I have also tried disabling firewall and virus scanner too – no difference.

Best Answer

Conclusion and a workaround

After exhausting all options on Windows, I decided to switch to Linux, mostly because I was frustrated with inability to profile and debug in detail.

I have moved the whole setup to Ubuntu 14.04. I first tried XAMPP but gave up because of conflicts between XAMPP and MySQL and MySQL Workbench. Then I moved to vanilla MySQL (5.5, I think) and vanilla Apache 2.

However, I was still left with the same problem – no visible bottleneck and resources still underutilized. I suspected throttling in TCP sockets (used between Perl code and MySQL), but further profiling proved this not to be the case.

Then, I turned my attantion to Perl DBI module DBD::SQL, thinking that it may be doing some throttlinig. I did some tests where I replaced DBI calls in Perl with system calls (system("mysql -e'INSERT INTO blah blah …'). I have determined that the performance did not change, therefore absolving DBI as a culprit.

I need to add one important detail now: I was in fact always running a number of my Perl scripts in parallel. Given that the CPU has 8 cores, this is necessary to utilize all of them, of course. Further debugging showed that almost all my perl processes which were supposed to work furiously were sleeping most of the time. Ubunty System Monitor shawed them as waiting on Waiting Channels wait_answer_interruptible or unix_stream_recvmsg. CPU History graph in System Monitor showed all perl processes jumping to 100% CPU utilization and then dropping to ~0% in unison. I suspected that MySQL server is not configured for multi threading, but htop showed 17 mysqld threads activated, confirming that all should be ok.
I suspected that all MySQL threads were waiting on the same semaphore and were locked out for most of the time. I dreaded delving into the dark bowels of MySQL trying to figure out what goes on inside. Instead, I decided to replace MySQL with MariaDB, even though MariaDB seems to have had the same issue originally when I was running it on Windows.

Lo and behold – this finally worked. My perl scripts were screaming.

One last problem remained: I had a very rudimentary method of parallelising the perl scripts: I would just run 10 or 20 with their respective loads and hope that they would utilize all the resources.

This has obvious drawbacks: if too many processes are spawned, the OS may spend too much time swapping them (although not a serious issue with only 20 processes, it becomes an issue with e.g. 1000). If not enough processes are spawned (e.g. less than 8, for each core) the CPU will not be utilized fully for sure. If too many processes exhaust RAM, Linux will turn to disk and will start swapping. As soon as this starts happening, everything grinds to a halt.

I searched but could not find a perl library/script/code which would spawn new processes only when CPU, memory and disk are under utilized. Hence I created my own: raspawn.pl (resource aware spawn) which I placed on github. Raspawn.pl spawns a number of processes while trying to keep resources utilization just below the maximum. It constantly checks the CPU, memory and disk utilization and only if all are less than ~90% utilized, starts a new process.

Finally, this worked. I can now process my whole load in around 7 days, instead of many months...

Related Solutions

Mysql – Capabilities of InnoDB INSERT Performance

You need to tune your InnoDB settings in the following areas:

Make InnoDB access all your cores
Increase innodb_buffer_pool_size to 12G
Increase innodb_buffer_pool_instances to 2 (First run numactl --hardware to determine the number of Physical CPUs. What every number of CPUs it reports, use that number. I learned this recently in Jeremy Cole's Blog)
Increase Log File Size (innodb_log_file_size) to 2047M
support separate tablespace files for individual InnoDB tables (enaled innodb_file_per_table)
support either high performance or high durability (ACID Compliance)
- High Performance : innodb_flush_log_at_trx_commit set to 0 or 2
- High Durability : innodb_flush_log_at_trx_commit set to 1 (Default)
- Increase Size up the innodb_log_buffer_size in conjunction with the number of transactions per second (perhaps 32M)
- Your current setting for innodb_flush_log_at_trx_commit is good
- Your current setting for innodb_flush_method is good
Increase innodb_read_io_threads to 64
Increase innodb_write_io_threads to 64
Increase innodb_io_capactity to 10000

Here are my past posts on tuning the InnoDB storage engine

Is the CPU performance relevant for a database server? (Apr 26, 2012)
Why does InnoDB store all databases in one file? (Mar 25, 2012)
Multi cores and MySQL Performance (Sep 20, 2011)
Possible to make MySQL use more than one core? (Sep 12, 2011)
About single threaded versus multithreaded databases performance (May 26, 2011)
How to safely change MySQL innodb variable 'innodb_log_file_size'? (Feb 16, 2011)
How do you tune MySQL for a heavy InnoDB workload? (Feb 12, 2011)
Howto: Clean a mysql InnoDB storage engine? (October 29, 2010)

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

...even surpassing it's theorically maximum possible allocation.

[OK] Maximum possible memory usage: 7.3G (46% of installed RAM)

There is not actually a way to calculate maximum possible memory usage for MySQL, because there is no cap on the memory it can request from the system.

The calculation done by mysqltuner.pl is only an estimate, based on a formula that doesn't take into account all possible variables, because if all possible variables were taken into account, the answer would always be "infinite." It's unfortunate that it's labeled this way.

Here is my theory on what's contributing to your excessive memory usage:

thread_cache_size       = 128

Given that your max_connections is set to 200, the value of 128 for thread_cache_size seems far too high. Here's what makes me think this might be contributing to your problem:

When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated.

^{http://dev.mysql.com/doc/refman/5.6/en/memory-use.html}

If your workload causes even an occasional client thread to require a large amount of memory, those threads may be holding onto that memory, then going back to the pool and sitting around, continuing to hold on to memory they don't technically "need" any more, on the premise that holding on to the memory is less costly than releasing it if you're likely to need it again.

I think it's worth a try to do the following, after first making a note of how much memory MySQL is using at the moment.

Note how many threads are currently cached:

mysql> show status like 'Threads_cached';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Threads_cached | 9     |
+----------------+-------+
1 row in set (0.00 sec)

Next, disable the thread cache.

mysql> SET GLOBAL thread_cache_size = 0;

This disables the thread cache, but the cached threads will stay in the pool until they're used one more time. Disconnect from the server, then reconnect and repeat.

mysql> show status like 'Threads_cached';

Continue disconnecting, reconnecting, and checking until the counter reaches 0.

Then, see how much memory MySQL is holding.

You may see a decrease, possibly significant, and then again you may not. I tested this on one of my systems, which had 9 threads in the cache. Once those threads had all been cleared out of the cache, the total memory held by MySQL did decrease... not by much, but it does illustrate that threads in the cache do release at least some memory when they are destroyed.

If you see a significant decrease, you may have found your problem. If you don't, then there's one more thing that needs to happen, and how quickly it can happen depends on your environment.

If the theory holds that the other threads -- the ones currently servicing active client connections -- have significant memory allocated to them, either because of recent work in their current client session or because of work requiring a lot of memory that was done by another connection prior to them languishing in the pool, then you won't see all of the potential reduction in memory consumption until those threads are allowed to die and be destroyed. Presumably your application doesn't hold them forever, but how long it will take to know for sure whether there's a difference will depend on whether you have the option of cycling your application (dropping and reconnecting the client threads) or if you'll have to just wait for them to be dropped and reconnected over time on their own.

But... it seems like a worthwhile test. You should not see a substantial performance penalty by setting thread_cache_size to 0. Fortunately, thread_cache_size is a dynamic variable, so you can freely change it with the server running.

Best Answer

Related Solutions

Mysql – Capabilities of InnoDB INSERT Performance

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

Related Question