Well it's really a matter of you deciding on the architecture you want to go with.
SimpleDB is a non-relational "no-sql" data storage system built around the ideas of collections of items accessible via key/value pairing.
Where as RDS is really just a custom version of MySQL which a fully relational SQL database.
both systems will work for tracking profiles but the implementation patterns are very different. Neither is right or wrong, so it's your call on what you want to do.
If all your database use InnoDB only, I have some good news.
You should be to dump all the database in parallel from a slave.
In fact, you can force all the databases into the same point-in-time.
First thing to remember about a the Slave is that it is not not required to have binary logging enabled if it is not a Master for other Slaves.
You cannot use --master-data
option for parallel dumps because each dump will have a different position written at line 22 of each dump file. It is better to record the Master's last log file and position the Slave executed using SHOW SLAVE STATUS\G
. That way, all the databases have the same point-in-time position.
You can collect all databases and script the parallel dump of all the database.
DBLIST=/tmp/ListOfDatabasesToParallelDump.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}
mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos" | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt
mysql -h... -u... -p... -AN -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql','performance_schema')" > ${DBLIST}
for DB in `cat ${DBLIST}`
do
mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz &
done
wait
mysql -h... -u... -p... -e"START SLAVE;"
If there are simply too many databases, dump them 10 or 20 at a time as follows:
DBLIST=/tmp/ListOfDatabasesToParallelDump.txt
SSS=/tmp/ShowSlaveStatusDisplay.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}
mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos" | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt
mysql -h... -u... -p... -AN -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql','performance_schema')" > ${DBLIST}
COMMIT_LIMIT=20
COMMIT_COUNT=0
for DB in `cat ${DBLIST}`
do
mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
then
COMMIT_COUNT=0
wait
fi
done
wait
if [ ${COMMIT_COUNT} -gt 0 ]
then
wait
fi
mysql -h... -u... -p... -e"START SLAVE;"
If you need to recover a single table, you can parallel dump tables 20 at a time in size order.
Try this:
TBLIST=/tmp/ListOfTablesToParallelDump.txt
SSS=/tmp/ShowSlaveStatusDisplay.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}
mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos" | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt
mysql -h... -u... -p... -AN -e"SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','mysql','performance_schema') ORDER BY data_length" > ${DBLIST}
COMMIT_LIMIT=20
COMMIT_COUNT=0
for DBTB in `cat ${TBLIST}`
do
DB=`echo "${DBTB}" | sed 's/\./ /g' | awk '{print $1}'`
TB=`echo "${DBTB}" | sed 's/\./ /g' | awk '{print $2}'`
DUMPFILE=$DB-{DB}-TBL-${TB}.sql.gz
mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} ${TB} | gzip > ${DUMPFILE} &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
then
COMMIT_COUNT=0
wait
fi
done
wait
if [ ${COMMIT_COUNT} -gt 0 ]
then
wait
fi
mysql -h... -u... -p... -e"START SLAVE;"
Now that you have scripts to dump databases or individual tables, you can load that data at your discretion. If you need to get SQL executed from the binary logs on the master, you can use mysqlbinlog
and give it the position ot datetime and output the SQL to other text files. You just have to perform due diligence to find the amount of data you need from whatever timestamps the bnary logs have. Just remember that every binary log's timestamp in the OS represents that last time it was written.
Best Answer
MyISAM tables are not guaranteed to be consistent with disk and, effectively, Amazon recommends to manually do the equivalent to a
FLUSH TABLES WITH READ LOCK
(you cannot do that in RDS, so you have to lock and flush each MyISAM table individually). So you are right in the fact that backups may not be doing correctly on your setup.Creating a backup from a slave is a very common way of doing backups, because in your case, you do not care about blocking or affecting the performance of the slaves. But you are not really solving the problem, because MyISAM is not supported for read replicas, as it basically uses the same snapshoting mechanism, potentially inconsistent with non transactional databases.
If you really need hot & consistent backups, you need InnoDB. Otherwise, lock, flush & snapshot.
If you want to guarantee the consistency of a backup, though, setting a replica and seeing that it works is one of the typical easy ways to partially do it. So creating a replica and leaving it on for a while without replication breaking may mean that at least it has all the records and tables you used in your latest transactions (with no guarantee overall).