MySQL – Galera Takes Too Much Time for Pre-commit Stage

galeraMySQLperconaperformance

Our system has 3 Percona XtraDB servers, laid in the same network (we are using azure cloud). Our database has around 100 tables, ~120GB. The biggest table contains about 100M rows. Ping time between them is acceptable, i think (~ < 0.5ms).
I took a profiling test on all of my servers:

set profiling=1;
insert into links(id) values(1);
show profile for query 1;

But pre-commit stage of galera take too much time to finish (2.17s) this simple query. Result as bellow:

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000023 |
| checking permissions           | 0.000004 |
| Opening tables                 | 0.000009 |
| init                           | 0.000007 |
| System lock                    | 0.000528 |
| updating                       | 0.002088 |
| end                            | 0.000004 |
| Waiting for query cache lock   | 0.000002 |
| end                            | 0.000006 |
| query end                      | 0.000005 |
| wsrep in pre-commit stage      | 2.172553 |
| ha_commit_one_phase(538401194) | 0.000006 |
| innobase_commit_low():trx_comm | 0.000029 |
| ha_commit_one_phase(538401194) | 0.005416 |
| wsrep in pre-commit stage      | 0.000024 |
| closing tables                 | 0.000007 |
| freeing items                  | 0.000147 |
| logging slow query             | 0.000003 |
| logging slow query             | 0.000047 |
| cleaning up                    | 0.000007 |
+--------------------------------+----------+

And too many warnings in the error log file complain about committed log fail:

2017-03-30 06:12:18 14342 [Warning] WSREP: Failed to report last committed 539015569, -4 (Interrupted system call)
2017-03-30 06:12:47 14342 [Warning] WSREP: Failed to report last committed 539018167, -4 (Interrupted system call)

Does anyone know how to reduce processing time for pre-commit stage or what causes the issue?

Thank you very much.

Here is wsrep_status
wsrep_local_cert_failures, wsrep_local_send_queue_avg, wsrep_local_recv_queue_avg seem to be too high

+------------------------------+----------------------------------------------------+
| Variable_name                | Value                                              |
+------------------------------+----------------------------------------------------+
| wsrep_local_state_uuid       | 8fadd81c-b664-11e6-9509-56a24df2cfb6               |
| wsrep_protocol_version       | 7                                                  |
| wsrep_last_committed         | 571341783                                          |
| wsrep_replicated             | 297503                                             |
| wsrep_replicated_bytes       | 218414523                                          |
| wsrep_repl_keys              | 2214789                                            |
| wsrep_repl_keys_bytes        | 24565584                                           |
| wsrep_repl_data_bytes        | 169717325                                          |
| wsrep_repl_other_bytes       | 0                                                  |
| wsrep_received               | 191252                                             |
| wsrep_received_bytes         | 112705971                                          |
| wsrep_local_commits          | 297345                                             |
| wsrep_local_cert_failures    | 53                                                 |
| wsrep_local_replays          | 18                                                 |
| wsrep_local_send_queue       | 96                                                 |
| wsrep_local_send_queue_max   | 155                                                |
| wsrep_local_send_queue_min   | 0                                                  |
| wsrep_local_send_queue_avg   | 15.290253                                          |
| wsrep_local_recv_queue       | 0                                                  |
| wsrep_local_recv_queue_max   | 10660                                              |
| wsrep_local_recv_queue_min   | 0                                                  |
| wsrep_local_recv_queue_avg   | 2860.841210                                        |
| wsrep_local_cached_downto    | 571192180                                          |
| wsrep_flow_control_paused_ns | 1903747659010                                      |
| wsrep_flow_control_paused    | 0.309630                                           |
| wsrep_flow_control_sent      | 1332                                               |
| wsrep_flow_control_recv      | 12696                                              |
| wsrep_cert_deps_distance     | 165.892497                                         |
| wsrep_apply_oooe             | 0.970206                                           |
| wsrep_apply_oool             | 0.000442                                           |
| wsrep_apply_window           | 16.287397                                          |
| wsrep_commit_oooe            | 0.000000                                           |
| wsrep_commit_oool            | 0.000442                                           |
| wsrep_commit_window          | 15.281639                                          |
| wsrep_local_state            | 4                                                  |
| wsrep_local_state_comment    | Synced                                             |
| wsrep_cert_index_size        | 1937                                               |
| wsrep_cert_bucket_count      | 7528                                               |
| wsrep_gcache_pool_size       | 134219009                                          |
| wsrep_causal_reads           | 0                                                  |
| wsrep_cert_interval          | 40.522988                                          |
| wsrep_incoming_addresses     | 10.0.0.13:3306,10.0.0.79:3306,10.0.0.52:3306       |
| wsrep_desync_count           | 0                                                  |
| wsrep_evs_delayed            |                                                    |
| wsrep_evs_evict_list         |                                                    |
| wsrep_evs_repl_latency       | 0.000288255/0.000769232/0.00570055/0.000398325/764 |
| wsrep_evs_state              | OPERATIONAL                                        |
| wsrep_gcomm_uuid             | 88432ffa-1911-11e7-b1d5-da9dc1aa2008               |
| wsrep_cluster_conf_id        | 38                                                 |
| wsrep_cluster_size           | 3                                                  |
| wsrep_cluster_state_uuid     | 8fadd81c-b664-11e6-9509-56a24df2cfb6               |
| wsrep_cluster_status         | Primary                                            |
| wsrep_connected              | ON                                                 |
| wsrep_local_bf_aborts        | 2                                                  |
| wsrep_local_index            | 1                                                  |
| wsrep_provider_name          | Galera                                             |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>                  |
| wsrep_provider_version       | 3.20(r7e383f7)                                     |
| wsrep_ready                  | ON                                                 |

+——————————+—————————————————-+

Update 2: adding my.cnf

[mysqld]
skip-name-resolve
bind-address = 0.0.0.0
datadir=/mnt/mysql/data-main
user=mysql

# Path to Galera library
wsrep_provider=/usr/lib/libgalera_smm.so

#wsrep_cluster_address=gcomm://10.0.0.13,10.0.0.52

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node #1 address
wsrep_node_address=10.0.0.6

# SST method
wsrep_sst_method=xtrabackup-v2

# Cluster name
wsrep_cluster_name=percona_xtradb

# Authentication for SST method
#wsrep_sst_auth="sstuser:rewjsf&*HKJnzPYuWbue"


# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000

# CACHES AND LIMITS #
tmp-table-size                 = 512M
max-heap-table-size            = 512M
query-cache-type               = 1
query-cache-size               = 512M
query_cache_limit          = 64M
max-connections                = 5000
thread-cache-size              = 150
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache               = 1000
wait_timeout               = 600
connect_timeout = 60
interactive_timeout            = 3600
net_read_timeout=300
net_read_timeout=300
innodb_lock_wait_timeout = 120
tmpdir=/mnt/mysql/tmp
wsrep_retry_autocommit=2
wsrep_slave_threads=4

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
#innodb-log-file-size           = 128M
innodb-log-file-size           = 1G
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 38G


long_query_time=1
slow_query_log=1
slow_query_log_file="/mnt/mysql/log/log-slow-queries.log"
log-queries-not-using-indexes=1

Best Answer

About your Error Log, From Percona Blog:

Reporting the last committed transaction is just a part of the certification index purge process. In case it fails for some reason (it occasionally does), the cert index purge may be a little delayed. But it does not mean the transaction was not applied successfully. This is a warning after all.

So It doesn't matter.

But about your Primary question:

Do you route all write queries into one server ?

I recommend to do this if you have heavy write for avoid deadlock and waiting because Galera requires all transactions to commit in order.

Update 1

From Fromdual Blog:

If you heavily rely on Query Cache you should first fix the queries and redesign your application. There is experimental Query Cache support since Galera v3.1. But the Galera documentation clearly states: Do not use query cache..

So maybe it causes the problem. Please disable query cache type on all of your servers and check it again.

UPDATE 2014-02-15 19:49 EST

Your last comment was

The table has only fixed size data (char[60], int, ...) One line takes 126b. About the bulk_insert_buffer_size, since I'm using INSERT ... ON DUPLICATE ..., would it help ? Also, why did you choose 512M, and not a slower value, similar to the max_allowed_packet ? (or maybe 128M)

My choice of 512M was arbitrary. You can set it to whatever you are comfortable with. Just don't leave it at the default value of 8M.

Since you gave the row size, let's to the math.

10,000 rows X 126 bytes/row = 1,260,000 = 1.2 MB

OK, bulk insert buffer may not be an issue.

OBSERVATION

I don't think MySQL likes INSERT IGNORE combined with ON DUPLICATE KEY update. Why ?

INSERT IGNORE says INSERT but reject the incoming row if the PRIMARY KEY already exists.
ON DUPLICATE KEY says INSERT but perform some UPDATE on specific columns if the PRIMARY KEY already exists.
Logically, this does not make sense to use both of them. Which one do you want?
- IGNORE duplicates
- UPDATE columns on duplicates

The INSERT should either be

INSERT IGNORE

INSERT ... ON DUPLICATE KEY

Since you have a meaningful ON DUPLICATE KEY, ditch the word IGNORE.

You could also change INSERT ... ON DUPLICATE KEY into REPLACE INTO if you are replacing all non PRIMARY KEY columns.

+——————————+—————————————————-+

Best Answer

Related Solutions

Mysql – The total number of locks exceeds the lock table size, even after increasing buffer pool size

Mysql – INSERT take too much time

UPDATE 2014-02-15 19:49 EST

OBSERVATION

Related Question