I have the following in a shell script:
for file in $local_dir/myfile.log.*;
do
file_name=$(basename $file);
server_name=$(echo $file_name | cut -f 3 -d '.');
file_location=$(echo $file);
mv $file_location $local_dir/in_progress1.log
mysql -hxxx -P3306 -uxxx -pxxx -e "set @server_name='${server_name}'; source ${sql_script};"
rm $local_dir/in_progress1.log
done
It basically gets all files in a directory that match the criteria, extracts a servername from the filename, before passing it across to a MySQL script for procesing.
What I am wondering is if I have 10 files that take 60 seconds each to complete, and after 5 minutes I then start a second instance of the shell script:
- a) will the second script still see the files that havent been processed
- b) will it cause problems for the first instance if it deletes files
or will I be able to run them in parallel without issue?
Best Answer
One would assume that "60 seconds" (and even "5 minutes") is just a good estimate, and that there is a risk that the first batch is still in progress when the second batch is started. If you want to separate the batches (and if there is no problem aside from the log-files in an occasional overlap), a better approach would be to make a batch number as part of the in-progress filenaming convention.
Something like this:
before the for-loop, and then at the start of the loop, check that your pattern matches an actual file
and use the batch number in the filename:
and for forth. That reduces the risk of collision.