MySQL – Binlog Limitations

binlogMySQLmysqlbinlog

I have a MySQL instance running on AWS, with around 5000 inserts per second. Any idea what the performance impact will be if I use binlog (row) and a binlog tailer?

Check this link

From my understanding, a bin log tailer polls the MySQL binlog periodically in order to make a 'realtime' data connection possible. The binlog tailer runs in NodeJS.

The point is, I have to use MySQL and I want to use Meteor to get the data in realtime to my clients. Hence my idea of using this binlog tailer.

Because the MySQL db is gonna be filled quite intensively (5000 inserts per second) , I want to know at which point the binlog/binlog tailer gets severe performance issues.

Grts, Tom

Best Answer

I have developed software with similar functionality, the ability to use the MySQL replication stream (binary log, binlog) to capture events in near-real-time in response to inserts/updates/deletes in the database.

Here are some observations I've made regarding performance. Luckily, the potential hot spots are largely independent of each other.

I will assume, since I was unfamiliar with the Node package you cited and have only just now given their code a cursory review, that they are not actually "tailing" the binlog via polling, but are actually emulating a slave/replica server and connecting to the master and requesting the replication stream.

The first potential bottleneck is the master's ability to write the amount of Binlog data required (I/O throughput is the primary cobstraint). If your master is already logging in ROW format, then this problem is already solved. If not, then switch your Binlog format, and see. I prefer ROW format, anyway, because it's very useful for data recovery when queries go bad or the application does something to the data it shouldn't have. It is possible (using 3rd party tools) to capture what happened and reverse it -- in the default configuration, when a delete occurs (for example) the deleted data is actually written to the binary log.

The next point of resource consumption is the slave connection made by such a tool to the master, where the master pushes the data. A common misconception is that a slave "polls" the master. In fact, the slave initiates the connection, but the master pushes the data. This is actually a load on the master that has very little impact on performance when the number of slaves connected is small (say, 5 or fewer). This load can be eliminated from the master entirely by connecting the "binlog tailer" not to the master, but to an existing slave of the master, with log_slave_updates configured.

Transport of the data from the master to the pseudo-slave can eat significant network bandwidth, so your external utility should support the MySQL client/server compression protocol to reduce this bandwidth. Enabling this capability can achieve compression ratios of 10:1 depending on the payload.

The final pain point is the external utility itself. The MySQL Binlog format is a very tightly packed binary format (hence "binary log") that must be parsed and decoded. The efficiency with which the external utility can unpack and manipulate this data stream will determine how close to real-time the events detected can be emitted, since inefficient code will cause your decided event stream to lag further and further behind the master, though this factor won't have any performance impact on the master server itself.

In short, if your master can handle the workload of generating row-format binlogs for the volume of traffic you expect, the rest of the potential issues are still potential issues, but they should have no meaningful performance implications on the master server, itself.