My question is a bit different from some older questions simply asking for "deleting all but the most recent n
files in a directory".
I have a directory that contains different 'groups' of files where each group of files share some arbitrary prefix and each group has at least one file. I do not know these prefixes in advance and I do not know how many groups there are.
EDIT: actually, I know something about the file names, that is they all follow the pattern prefix-some_digits-some_digits.tar.bz2
. The only thing matters here is the prefix
part, and we can assume that within each prefix
there is no digit or dash.
I want to do the following in a bash
script:
-
Go through the given directory, identify all existing 'groups', and for each group of files, delete all but the most recent
n
files of the group only. -
If there are less than
n
files for a group, do nothing for that group, i.e. do not delete any file for that group.
What is a robust and safe way of doing the above in bash
? Could you please explain the commands step-by-step?
Best Answer
The script:
Explanation:
something-something-something.tar.bz2
regex, cutting of only the first part up to the first dash and make it unique.PREFIXES
PREFIXES
:ALL_FILES
withPREFIX
ALL_FILES
is less than the number of files to be kept -> if true, we can stop here, nothing to removeKEEP
files which are the most recentNUMKEEP
filesALL_FILES
and check if the given file is not in theKEEP
file list. If so: remove it.Example result when running it: