Shell – Move huge number of files into date structured directory order

filesrenameshell-script

I have about 1 million files in this directory: /home/username/images/

Each of the files are called something like: 012345678910(Place)_0_20120414185957_28841.jpg with the timestamp part of the filename changing on each picture.

The code below contains code to sort/move the files into this date structure: /home/username/sorted/2012/04/14/18/name_of_file.jpg

For a small sample of files it works fine, but for the huge directory my putty terminal gets disconnected after outputting

Directory $newdir does not exist.  Creating same.

I had other code which always died with the error code argument list too long.

Here is the code:

#!/bin/bash
ALLFILES=(images/*)
for ((i=0; i<${#ALLFILES[*]}; i+=30000));
do
    set $(echo "${ALLFILES[@]:i:30000}" | awk -F_ '{print $1, $2, $3, $4, $5}')
    fullyear=$3
    year=$(echo $fullyear |cut -c1-4)
    month=$(echo $fullyear |cut -c5-6)
    day=$(echo $fullyear |cut -c7-8)
    hour=$(echo $fullyear |cut -c9-10)
    newdir=$(echo /home/username/sorted/$year/$month/$day/$hour/)
    if ! [ -d $newdir ]; then
        echo Directory $newdir does not exist.  Creating same.
        mkdir -p $newdir;
    fi
    mv "${ALLFILES[@]:i:30000}" $newdir;
done

Any ideas why the connection will not hold while performing the large loop?

Best Answer

Try to run it in screen session. Or even try another construction. I believe find + sed will work better then pure bash:

find images/ -name "*.jpg" | sed 's%^[^_]*_[^_]*_\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\).*%mkdir -p "/home/username/sorted/\1/\2/\3/\4" \&\& mv "&" "/home/username/sorted/\1/\2/\3/\4/"%'

This is just to show, how sed make commands to execute. Adding e after last % will force command executing:

find images/ -name "*.jpg" | sed 's%^[^_]*_[^_]*_\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\).*%mkdir -p "/home/username/sorted/\1/\2/\3/\4" \&\& mv "&" "/home/username/sorted/\1/\2/\3/\4/"%e'

ps. You don't need to use in bash

day=$(echo $fullyear |cut -c7-8)

Bash can do it itself without echo | cut :

day=${fullyear:6:2}

Related Solutions

Files – How to Find Newest File with Multiple Filetype Restrictions

Something like this should work:

find . \( -iname "*.mp3" -o -iname "*.jpg" \) -printf '%TY%Tm%Td %TT %p\n' | sort -r

This should find the files that (case-insensitively) find files ending with mp3 or jpg, print out the modification time, then sort it in reverse order.

It seems to show both file-types when you run it effectively as two commands:

( find . -iname "*.mp3" -printf '%TY%Tm%Td %TT %p\n' ; find . -iname "*.jpg" -printf '%TY%Tm%Td %TT %p\n' ) | sort -r

Shell Script – How to Rename Files by Reversing Number Order

With zsh:

autoload zmv # best in ~/.zshrc

typeset -A c=()
zmv -n '(*)_<->.txt(#qnOn)' '$1_$((++c[${(b)1}])).txt-renamed' &&
  : zmv '(*)-renamed' '$1'

(remove the -n (dry-run) and :, if happy (and remember to re-initialize c=() before running again without dry run)).

<->: is like <1-12> to match decimal numbers in a range, but here with no bound specified, so matches any sequence of one or more decimal digits. Could also be written [0-9]## where ## is zsh's equivalent of ERE +.
(#q...) is the explicit syntax for specifying glob qualifiers.
n: sorts numerically
On: sorts by name in reverse. So with n above, that sorts the list of matching files numerically in reverse.
For the replacement, $1 contains what's captured in (*), so the part before _<digits>.txt.
We append $((++c[${(b)1}])), where $c is the associative array declared earlier.
${(b)1} is $1 with glob characters escaped (without it, it wouldn't work properly if $1 contained ]).
we do it in 2 stages (append a -renamed suffix which is stripped in the second stage), to avoid overwriting files in the process.

On your sample, that gives:

mv -- data2_2.txt data2_1.txt-renamed
mv -- data2_1.txt data2_2.txt-renamed
mv -- data1_3.txt data1_1.txt-renamed
mv -- data1_2.txt data1_2.txt-renamed
mv -- data1_1.txt data1_3.txt-renamed

mv -- data1_1.txt-renamed data1_1.txt
mv -- data1_2.txt-renamed data1_2.txt
mv -- data1_3.txt-renamed data1_3.txt
mv -- data2_1.txt-renamed data2_1.txt
mv -- data2_2.txt-renamed data2_2.txt

Note that technically, it doesn't reverse the order, or only does it in the case where the numbers are incrementing by one and start at 1 like in your sample. It will turn all of [1, 2, 3], [4, 5, 6], [0, 10, 20] to [3, 2, 1].

To reverse the list, it would be a bit more involved. It could be something like:

all_files=(*_<->.txt(n))
prefixes=(${all_files%_*})

for prefix (${(u)prefixes}) {
  files=(${(M)all_files:#${prefix}_<->.txt})
  new_files=(${(Oa)^files}-renamed)
  for old new (${files:^new_files})
    echo mv -i -- $old $new-renamed
}

(remove echo when happy).

And run the zmv '(*)-renamed' '$1' again as the second phase.

On a different sample with a additional [0, 3, 10, 20] list as a third example, that gives:

mv -i -- data1_1.txt data1_3.txt-renamed
mv -i -- data1_2.txt data1_2.txt-renamed
mv -i -- data1_3.txt data1_1.txt-renamed
mv -i -- data2_1.txt data2_2.txt-renamed
mv -i -- data2_2.txt data2_1.txt-renamed
mv -i -- data3_0.txt data3_20.txt-renamed
mv -i -- data3_3.txt data3_10.txt-renamed
mv -i -- data3_10.txt data3_3.txt-renamed
mv -i -- data3_20.txt data3_0.txt-renamed

Those solutions make no assumption on what character (or non-character) the file names may contain, won't rename files unless they end in _<digits>.txt. The zmv-based approach will guard against overwriting files named with a -renamed suffix that would have been there beforehand, not the latter approach (though -i will cause mv to prompt you before that happens). Alternatively, instead of adding a -renamed suffix, you could move the renamed file into a renamed directory.

Best Answer

Related Solutions

Files – How to Find Newest File with Multiple Filetype Restrictions

Shell Script – How to Rename Files by Reversing Number Order

Related Question