Bash – find the single largest file

bashdisk-usagefilesfindshell

We host a share of size 4 TB. How efficient is it to find a file with highest size.

Usually we use:

du -ak | sort -k1 -bn | tail -1

and it is not easy to scan through a share of such huge size and then again sort them.

Any suggestions to know only the single largest file in the share.

And also du -ak is returning the size of current directory like (". 123455"). How do i avoid that?

Best Answer

I don't know of any other way besides scanning the directory tree in question to collect the file sizes so that you can determine the largest file. If you know that there's a threshold of size you can instruct find to dismiss files that are below this threshold size.

$ find . -type f -size +50M ....

Would dismiss any files below the size of 50MB. If you know these files are always in a specific location you can target your find to this area instead of scanning the entire disk.

NOTE: This is a method that I typically employee since you shouldn't be getting random files in non /var types of directories, typically.

As to du you can tell it to output the sizes in human readable formats using the -h switch. The sort command knows how to sort these as well, again using its -h switch.

Example

$ find /home/saml/apps -type f -size +50M -print0 | \
    du -h --files0-from=- | sort -h | tail -1
1.4G    /home/saml/apps/MeVisLabSDK2.2.1_gcc-64.bin

The above find returns the list of files that are > 50MB using a null (\0) character as the separator. The du command takes this list and knows to split on nulls using the --files0-from=- switch. This output is then sorted by its human formatted sizes.

Without the tail -1:

$ find /home/saml/apps -type f -size +50M -print0 | \
    du -h --files0-from=- | sort -h
55M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/lib/libQtXmlPatternsMLAB.so.4.6.2.debug
55M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/Sources/Qt4/qt/lib/libQtXmlPatternsMLAB.so.4.6.2.debug
56M /home/saml/apps/MeVisLabSDK/Packages/FMEwork/ThirdParty/lib/libitkvnl-4.0_d.a
66M /home/saml/apps/MeVisLabSDK/Packages/FMEwork/Release/lib/libMLDcmtkAccessories_d.so
79M /home/saml/apps/MeVisLabSDK/Packages/FMEwork/Release/lib/libMLDcmtkMLConverters_d.so
94M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/lib/libQtGuiMLAB.so.4.6.2.debug
94M /home/saml/apps/MeVisLabSDK/Packages/MeVis/ThirdParty/Sources/Qt4/qt/lib/libQtGuiMLAB.so.4.6.2.debug
112M    /home/saml/apps/ParaView-3.14.1-Linux-64bit.tar.gz
204M    /home/saml/apps/Slicer-4.1.1-linux-amd64.tar.gz
283M    /home/saml/apps/MeVisLabSDK/Packages/FMEwork/Release/lib/libMLDcmtkIODWrappers_d.so
1.4G    /home/saml/apps/MeVisLabSDK2.2.1_gcc-64.bin