Does all of the rows affected by the insert statement from db1 gets
transfered into db2 immediately and are waiting for the commit line to
execute in order to make the insert official?
Transafer is made immediatly, but other sessions will not see inserted data until commit (depends on transaction isolation level)
Or, are the rows affected by the insert statement are still inside db1
and are just waiting to be transfered into db2 once the "commit"
command executes?
Data are not waiting for commit, just storage engine wait for commit to make changes persisted.
I wanted to know this facts because I will be copying data from a
database through the internet and I'm worried that I might lose some
data in the process... Any help would be greaty appreciated.
Every TCP based connection guarantee data integrity.
You should use replication instead of this approach
Yes, this O_DSYNC
(the option) is fundamentally useless because MySQL/InnoDB engineers saw that it was dangerous in certain cases (corruption), and disabled it in many cases, making it a synonym to O_SYNC
. it also does an fsync()
for the same reason.
The main thing you want to tune is O_DIRECT
vs. default (flushed), and maybe ALL_O_DIRECT
on Percona.
In your case, I suppose you have good latency to the filesystem (where O_DIRECT
shines). Sometimes, having the transaction logs on the filesystem cache can be hurtful (although such a difference seem a bit too much for me - did you compare it with O_DIRECT
, available on all versions of MySQL, which only avoids data file caching?).
Also be careful, because sysbench may not be a good representative of your real-world load. Response times can vary a lot depending on flush_log_at_trx_commit
, I assume you have that as 1.
According to Dimitri, from Oracle-MySQL, the solution upstream comes in 5.7. I like it for now in Percona, as you can choose.
Best Answer
This is very complicated, at least in PostgreSQL.
PostgreSQL doesn't sync every WAL write, unless your "wal_sync_method" is "open_datasync" or "open_sync" (in which case, it syncs every write out of the WAL buffer, not every write into them)
It writes out and syncs the accumulated WAL records at each transaction commit, unless the committed transaction touched only UNLOGGED or TEMP tables, or synchronous_commit is turned off.
It also syncs at the end of each WAL file (16MB by default). It does this in the foreground and while holding some locks, and so can be quite a bottleneck in very large transactions.
The "wal_writer" process also wakes up occasionally and writes out and syncs the WAL. This is mostly to put a limit on the amount of unsynced WAL there can be when asynchronous commit is being used.
If someone wants to write out a dirty buffer from shared_buffers, but the WAL record which covers the dirtying of that buffer has not yet been synced, then a call will be made to write out and sync up to that WAL record. For data safety, a dirty buffer cannot be turned over the kernel until the WAL covering it has been written and synced. Also, hint bits cannot be set on a buffer until the commit record of the transaction it is "hinting at" has been synced. Sometimes it will force sync so it can set the hint bit, sometimes it will just skip setting the hint.
That is just the WAL. The syncing of the data files are an entirely different matter which I haven't covered.