Bash – Delete all files in directories except those whose path are listed in a file

bashfilesfindscriptingshell-script

Given the following files:

data/A/a.txt
data/B/b.pdf
...
date/P/whatever.log
...
data/Z/z.jpg

I would like to delete all files in the data/A/, data/B/, …, data/Z/ directories except those files that are situated under one of the directories listed in the file data/dont_clean.txt. For example, if we have data/P listed in data/dont_clean.txt then nothing should be touched under data/P/, etc.

Something like:

find data/ -mindepth 2 -maxdepth 2 -type f -not -path {listed in data/dont_clean} -delete

Of course it is not a valid command.

I have also tried variants of

find data/ -mindepth 2 -maxdepth 2 -type f -exec grep data/dont_clean.txt '{}' \;

but I only created either an invalid command or I had no idea why I got the output I did.

I am using bash on Ubuntu 12.10

Best Answer

This is code that I only roughly tested but might layout an approach for you to take. Assuming you have a file, ignore.txt like this:

1/
2/

Sample data

And I had sample directories with files in them like this:

$ mkdir -p dirs/{1..5}
$ touch dirs/{1..5}/afile

Resulting in this:

$ tree dirs/
dirs/
|-- 1
|   `-- afile
|-- 2
|   `-- afile
|-- 3
|   `-- afile
|-- 4
|   `-- afile
`-- 5
    `-- afile

Example run

Now if we run this command against this tree:

$ find dirs/ -type f -print0 | fgrep -zFvf ./ignore.txt
dirs/5/afiledirs/4/afiledirs/3/afile

We can see that we're only getting back the files that are in directories not listed in ignore.txt.

So we can add a rm to the end to remove the non-excluded files.

$ find dirs/ -type f -print0 | fgrep -zFvf ./ignore.txt | xargs -0 rm -f

Checking we can see that it worked:

$ tree dirs/
dirs/
|-- 1
|   `-- afile
|-- 2
|   `-- afile
|-- 3
|-- 4
`-- 5

Problems to be worked out

One big problem with this approach is that the strings in the ignore.txt file might match other portions of the directory structure. So some care needs to be paid to making sure that the strings in this file are unique in the way that you expect.

Some blocking could be put around the strings so that they're anchored to the beginning or the end of the string to protect them.

Details

The above commands are doing the following:

  1. finding all the files under the directory dirs
  2. filtering out any files that are under a directory present in the igonre.txt file
  3. passing the filter list via xargs to the rm -f command