I have a web application that access a remote storage running Linux to get some files, the problem is that the remote storage have currently 3 million files , so accessing the normal way is a bit tricky.
So I needed to work on a script that is going to make it a little bit more easy to use , this script is going to reorganize the files into multiple folders depending on their creation date and specially their names,i made the script and it worked just fine, it intended to do what it meant to do, but it was too slow, 12 hours to perform the work completely (12:13:48 to be precise)
.
I think that the slowness is coming from the multiple cut
and rev
calls I make.
example :
I get the file names with an ls
command that I loop into with for, and for each file I get the parent directory and, depending on the parent directory, I can get the correct year:
case "$parent" in
( "Type1" )
year=$(echo "$fichier" | rev | cut -d '_' -f 2 | rev );;
( "Type2" )
year=$(echo "$fichier" | rev | cut -d '_' -f 2 | rev);;
( "Type3" )
year=$(echo "$fichier" | rev | cut -d '_' -f 1 | rev | cut -c 1-4);;
( "Type4" )
year=$(echo "$fichier" | rev | cut -d '_' -f 1 | rev | cut -c 1-4);;
( "Type5" )
year=$(echo "$fichier" | rev | cut -d '_' -f 1 | rev | cut -c 1-4);;
esac
for type1 of files :
the file==>MY_AMAZING_FILE_THAT_IMADEIN_YEAR_TY.pdf
I need to get the year so I perform a reverse cut:
year=$(echo "$file" | rev | cut -d '_' -f 2 | rev );;
for type2 of files :
the file==>MY_AMAZING_FILE_THAT_IMADE_IN_YEAR_WITH_TY.pdf
etc…
and then I can mv
the file freely : mv $file /some/path/destination/$year/$parent
and yet this is the simplest example, there are some files that are much more complex, so to get 1 information I need to do 4 operations, 1 echo , 2rev and 1echo
.
While the script is running I am getting speeds of 50 files/sec to 100 files\s
, I got this info by doing a wc-l output.txt
of the script.
Is there anything I can do to make it faster? or another way to cut the files name? I know that I can use sed
or awk
or string operations but I did not really understand how.
Best Answer
To get the
YEAR
portion of the filenameMY_AMAZING_FILE_THAT_IMADEIN_YEAR_TY.pdf
without using external utilities:After update to the question:
Moving PDF files from under
topdir
to a directory/some/path/destination/<year>/<parent>
where<year>
is the year found in the filename of the file, and<parent>
is the basename of the original directory that the file was found in:movefiles.sh
is a shell script in the current directory: