Find Largest File Recursively – Bash Script Techniques

bashrecursiveshell-script

I am trying to find the largest file in a directory recursively. If there is a subdirectory inside of that directory the function needs to go inside that directory and check to see if the largest file is there. Once the largest file is found the output is displayed with the relative path name and the name and size of the largest file.

EX:

dude@shell2 (~...assignment/solutions) % bash maxfile.sh ~/test
class/asn
dude.h.gch: 9481628

This is what I have:

#!/bin/sh
clear

recursiveS() {
    for d in *; do
        if [ -d $d ]; then
            (cd $d; echo $(pwd)/$line; du -a; recursiveS;)
        fi
    done
}
recursiveS

I have been stuck for a while now. I cannot implement this by pipelining a number of existing Unix tools. Any ideas would be nice!

Best Answer

use find (here assuming GNU find) to output file names with the file size. sort. print out the largest one.

find . -type f -printf "%s\t%p\n" | sort -n | tail -1

That assumes file paths don't contain newline characters.

Using a loop in bash with the GNU implementation of stat:

shopt -s globstar
max_s=0
for f in **; do
  if [[ -f "$f" && ! -L "$f" ]]; then
    size=$( stat -c %s -- "$f" )
    if (( size > max_s )); then
      max_s=$size
      max_f=$f
    fi
  fi
done
echo "$max_s $max_f"

This will be significantly slower than the find solution. That also assumes that file names don't end in newline characters and will skip hidden files and not descend into hidden directories.

If there's a file called - in the current directory, the size of the file open on stdin will be considered.

Beware that versions of bash prior to 4.3 followed symbolic links when descending the directory tree.

Related Solutions

File deletion under special conditions, recursively if possible

I couldn't keep this simpler; this works but it assumes there are no files whose filename contains newlines in the target directory; first test the command using this:

find . -type f \( -name "*.cut" -o -name "*.cut.bak" \) -exec bash -c '[ -f "$(<<< "{}" sed "s/\(.*\/[^.]*\).*/\1/").rec" -o -f "$(<<< "{}" sed "s/\(.*\/[^.]*\).*/\1/").mpg" ] && echo "{}"' \;

If the files listed are those expected to be deleted, you can go ahead and run this:

find . -type f \( -name "*.cut" -o -name "*.cut.bak" \) -exec bash -c '[ -f "$(<<< "{}" sed "s/\(.*\/[^.]*\).*/\1/").rec" -o -f "$(<<< "{}" sed "s/\(.*\/[^.]*\).*/\1/").mpg" ] && rm "{}"' \;

Test on a directory hierarchy created ad hoc:

~/tmp$ tree
.
└── dir1
    ├── file1.cut
    ├── file1.cut.bak
    ├── file1.rec
    ├── file2.cut
    ├── file2.cut.bak
    ├── file2.mpg
    ├── file3.cut
    ├── file3.cut.bak
    └── subdir1
        ├── file1.cut
        ├── file1.cut.bak
        ├── file1.rec
        ├── file2.cut
        ├── file2.cut.bak
        ├── file2.mpg
        ├── file3.cut
        └── file3.cut.bak

2 directories, 16 files
~/tmp$ find . -type f \( -name "*.cut" -o -name "*.cut.bak" \) -exec bash -c 'if [ -f "$(<<< "{}" sed "s/\(.*\/[^.]*\).*/\1/").rec" -o -f "$(<<< "{}" sed "s/\(.*\/[^.]*\).*/\1/").mpg" ]; then rm "{}"; fi' \;
~/tmp$ tree
.
└── dir1
    ├── file1.rec
    ├── file2.mpg
    ├── file3.cut
    ├── file3.cut.bak
    └── subdir1
        ├── file1.rec
        ├── file2.mpg
        ├── file3.cut
        └── file3.cut.bak

2 directories, 8 files

As you can see, all files with extension .cut or .bak for which a file with the same name and extension .rec or .mpg exists are deleted recursively (file1.cut and file1.cut.bak are deleted because of file1.rec, file2.cut and file2.cut.bak are deleted because of file2.mpg; file3.cut and file3.cut.bak are not deleted because there's no file3.rec or file3.mpg in the same directory)

Bash – Using bash variable substitution instead of cut/awk

You can remove the shortest leading substring that matches */

tmp="${filename#*/}"

and then remove the longest trailing substring that matches /*

echo "${tmp%%/*}"

Best Answer

Related Solutions

File deletion under special conditions, recursively if possible

Bash – Using bash variable substitution instead of cut/awk

Related Question