rm – Deleting Billions of Files from a Directory While Seeing the Progress

progress-informationrm

I have a directory of 30 TB having billions of files in it which are formally all JPEG files. I am deleting each folder of files like this:

sudo rm -rf bolands-mills-mhcptz

This command just runs and doesn't show anything whether it's working or not.

I want to see as it's deleting files or what is the current status of the command.

Best Answer

You can use rm -v to have rm print one line per file deleted. This way you can see that rm is indeed working to delete files. But if you have billions of files then all you will see is that rm is still working. You will have no idea how many files are already deleted and how many are left.

The tool pv can help you with a progress estimation.

http://www.ivarch.com/programs/pv.shtml

Here is how you would invoke rm with pv with example output

$ rm -rv dirname | pv -l -s 1000 > logfile
562  0:00:07 [79,8 /s] [====================>                 ] 56% ETA 0:00:05

In this contrived example I told pv that there are 1000 files. The output from pv shows that 562 are already deleted, elapsed time is 7 seconds, and the estimation to complete is in 5 seconds.

Some explanation:

  • pv -l makes pv to count by newlines instead of bytes
  • pv -s number tells pv what the total is so that it can give you an estimation.
  • The redirect to logfile at the end is for clean output. Otherwise the status line from pv gets mixed up with the output from rm -v. Bonus: you will have a logfile of what was deleted. But beware the file will get huge. You can also redirect to /dev/null if you don't need a log.

To get the number of files you can use this command:

$ find dirname | wc -l

This also can take a long time if there are billions of files. You can use pv here as well to see how much it has counted

$ find dirname | pv -l | wc -l
278k 0:00:04 [56,8k/s] [     <=>                                              ]
278044

Here it says that it took 4 seconds to count 278k files. The exact count at the end (278044) is the output from wc -l.

If you don't want to wait for the counting then you can either guess the number of files or use pv without estimation:

$ rm -rv dirname | pv -l > logfile

Like this you will have no estimation to finish but at least you will see how many files are already deleted. Redirect to /dev/null if you don't need the logfile.


Nitpick:

  • do you really need sudo?
  • usually rm -r is enough to delete recursively. no need for rm -f.
Related Question