I have a website for an event. I set up a cron job to back up the database every hour. The backup command is
mysqldump -u"$DB_USER" -p"$DB_PASS" --database "$DB_NAME" >"${OUTPUT_DIR}/${DB_NAME}.$(date +%Y%m%d-%H%M%S).sql"
# crontab: 30 * * * * /path/to/backup_db.sh
I have 200+ of files like this in the backup directory (pulled from server):
$ ls -1 "$OUTPUT_DIR"
ibug.20190301-133002.sql
ibug.20190301-143001.sql
ibug.20190301-153001.sql
ibug.20190301-163001.sql
ibug.20190301-173001.sql
...
The files aren't very big by themselves, averaging less than 2 MB each but summing up to more than 400 MB.
The top few lines of a dump is
-- MySQL dump 10.14 Distrib 5.5.60-MariaDB, for Linux (x86_64)
--
-- Host: localhost Database: ibug
-- ------------------------------------------------------
-- Server version 5.5.60-MariaDB
I want to perform a series of SELECT
queries on each dump (data analysis). The queries range from simple SELECT COUNT(*)
ones to complex ones with multiple JOIN
s and multiple subqueries (all are SELECT
).
While it's possible, I don't want to write a script to import them one by one and run the queries. I wonder if there is another better approach.
My local machine is running on 10.1.38-MariaDB-0ubuntu0.18.04.1 Ubuntu 18.04
(not exactly MySQL, it's Maria).
I have (I believe) enough skills and knowledge in Python and Bash (and regular expressions) for a moderately complex script to batch or automate the job, not sure if this is relevant, though.
Best Answer
Importing a small quantity of data like 400M shouldn't take long. Other questions have suggestions on how to do this quickly.
Because these are SQL dumps for a single database, they could be imported into a different database name without a problem. You could use a database name per date, import the data and perform the operations in parallel.