Have I got a stunning revelation for you on this problem. I saw this type problem before. I have a MySQL client at my employer's web hosting company with this situation :
The client, running MySQL 5.1.37, has two DB servers in Circular Replication (call it M1 and M2)
He performed a LOAD DATA LOCAL INFILE
with a 50GB CSV file on M1
Running SHOW SLAVE STATUS\G
several times on M2, Log Files and Positions were not moving.
Then, I noticed something very disturbing. When I ran SHOW SLAVE STATUS\G
several more times on M2, I noticed this pattern in the output:
Relay_Log_Space
was growing
Seconds_Behind_Master
was still 0
- Still, none of the Log Files or positions were changing
I was stunned because I did not know what was happening. A few hours later, I ran SHOW PROCESSLIST;
. To my surprise, there it was : LOAD DATA LOCAL INFILE
in the SQL Thread, but the file name was a temp file in the /tmp folder. All of a sudden, replication was more than 6 hours behind. Then, is dawned on me what was actually happening.
I learned that day that for mysqld to replicate LOAD DATA LOCAL INFILE
, this is happens under the hood:
When a Master server executes LOAD DATA LOCAL INFILE
- mysqld on the Master will...
- complete the
LOAD DATA LOCAL INFILE
command
- record the
LOAD DATA LOCAL INFILE
command its binary logs
- dump the enitre data file it just imported into its binary logs
- mysqld on the Slave will do the following:
- I/O Thread sees
LOAD DATA LOCAL INFILE
in the relay logs
- I/O Thread create the import file need for
LOAD DATA LOCAL INFILE
- I/O Thread copies the contents of all relay logs making up the import file, storing it in /tmp
- SQL Thread updates
Seconds_Behind_Master
based on when the Master completed LOAD DATA LOCAL INFILE
command
- SQL Thread executes
LOAD DATA LOCAL INFILE
using the mainifested temp file
In order to verfiy this is indeed happening:
- On the Master
- create a MyISAM table in the test database called mycsv
- Run
LOAD DATA LOCAL INFILE
to import a 5 line file CSV file into test.mycsv
- On the Slave
- See if test.mycsv exists with the imported data in it
- See if any file exists in /tmp that has the exact same size as the original import file on the Master
If that is what happened, test that again against a 10 GB file.
While it is replicating, goto into the OS in /var/lib/mysql and do ls -l
and look for the relay logs. You should see 10 consecutive relay logs 1GB. That will show it for sure because under normal circumstances, every relay log context switch skips by 3 files not 1. Seeing multiple 1GB relay logs reveal that the data for the LOAD DATA LOCAL INFILE
recently executed on the Master is currently being shipped over.
I hope this solves the mystery you have been experiencing.
Perhaps I am misreading this, but I am having difficulty understanding how your requirement is any different from built-in MySQL replication.
With MySQL replication, the master database writes changes to a binary log file. The slaves connect to the master, read the binary logs, write them to their local relay log, and play the statements back.
Please understand that realtime copy and backup are not the same thing. A real-time slave can help you recover from certain types of problems, but cannot resolve data-corruption issues (oops, I dropped that table!). For that, you need point-in-time backups. With a backup and a copy of the binary logs, you can recover from almost any type of failure by replaying the binary logs up to the point where the problem occurred.
Best Answer
Enable the slow query log and
log_queries_not_using_indexes
on the slave server and should be able to confirm what your gut is telling you. You can take the captured queries, runEXPLAIN
on them, and have some better documentation of your suspicions.To more directly answer the question, it's absolutely valid to declare indexes on a slave that are not declared on the master. I would suggest that it's a common practice, and one of many reasons why replicas are useful for so many things. Connect to the slave with an account with sufficient privilege, and alter the table directly on the slave.
The only exceptions to this -- which should be intuitively obvious, since the row data needs to be consistent on master and replica -- is that you cannot safely declare a
UNIQUE
constraint or a foreign key constraint on a slave when it doesn't exist on the master. This won't work, which is fine, since it doesn't make sense at any rate.Remember also that indexes in MySQL have names, which are automatically generated if they are not supplied when the index is added. A good practice might be to explicitly name these indexes on the replica so that future indexes added to the master cannot possibly cause an index name collision, which would break replication. My local convention is to prefix the index name with "ix_repl_" which would of course never be used on the master.
Note also that proper duplicate indexes (where exactly the same columns, and no others, are included in more than one index) are deprecated in MySQL 5.6 and disallowed by default in MySQL 5.7, making it apparently possible in later versions to cause replication to stop if you subsequently declare an identical index (even with a different name) on the master. This wouldn't be a critical issue (as long as you're monitoring replication -- you are, right?) since replication would be safe to restart by simply removing the now-redundant index on the replica before restarting the slave SQL thread. The failed event would be retried, and would now be valid, since there's no conflicting index, and the replacement index would be built on the replica.
Side note: remember that MySQL replication requires the version of MySQL on the replica servers to be the same as, or newer than, the version on the master... so when upgrading (or migrating to MariaDB) you will almost certainly want to upgrade the replicas first, and then the master. The reason for this is that a newer replica will understand the capabilities and quirks of an older master, but a newer master may introduce behaviors in the replication stream that an older replica server won't be able to interpret. There are limited exceptions to this rule, but it's most definitely the rule.