OBSERVATION #1
You have max_connect_errors to 1000000 (One Million). If you ever reach that many consecutive connection failures, you would simply run FLUSH HOSTS;
to clear up blocked connections. Given the high value of that setting, you will never need to run FLUSH HOSTS;
OBSERVATION #2
I am glad you mentioned your TIME_WAITs
. I wrote this post in ServerFault back on Feb 01, 2012
: MySQL lowering wait_timeout value to lower number of open connections. This is a bit of a hack but it is a necessary evil in this instance.
In short, just run this on every PXC node
SEC_TO_TIMEWAIT=1
echo ${SEC_TO_TIMEWAIT} > /proc/sys/net/ipv4/tcp_tw_recycle
echo ${SEC_TO_TIMEWAIT} > /proc/sys/net/ipv4/tcp_tw_reuse
This will make TIME_WAITs timeout in 1 second. You have to run these lines every time a node reboots.
By its Nature in a Galera Cluster a DML statement is expected to run a bit slower than a DML statement on a normal MySQL node because the response time of the DML statements includes not only the commit time on the local node but also sending the the write-set (ws) to the Group and receiving back the GTID of the ws.
So the total response time is: query time + 2 x round-trip to the group.
A simple insert will run possibly within 1 ms. And the round-trip to the group is possibly about 400 - 600 us. So 5 instead of 3 minutes I am not surprised to see...
Now, what can we improve?
a) Reduce query time:
Write performance is influenced by: innodb_buffer_pool_size (big enough to cache all data in memory), innodb_log_file_size (as bigger as better) and innodb_flush_log_at_trx_commit (0 or 2 are faster than 1). With your setting above you have not followed the Codership recommendation (innodb_flush_log_at_trx_commit=0).
b) Reduce number of round trips. When you batch your DML statements into transactions you have bigger but less write-sets which should cause less round trips. So your time spent in the network should become smaller.
c) Make your network faster. Use smallest possible network latency: dedicated network. 1 or 10 Gbit. Not much hardware in between (firewall, switches, routers, etc).
d) Parallelize your inserts?
Regards,
Oli
PS: There are still some places free in our Galera Cluster trainings in 2 weeks!
Best Answer
shutdown the node while maintaining quorum. so that there will be no downtime for application. manually move those all gcache files to some other location(don't delete them for safer side).Restart the server. You will get new gcache file.