Bash – Remove all files except for last file of each month

bashrm

Consider a directory with MySQL backup files with timestamps as part of the filename:

-rw-rw-r-- 1 ubuntu ubuntu 35856184 Nov 16 16:00 db_2013-11-16_1600.sql
-rw-rw-r-- 1 ubuntu ubuntu 35856915 Nov 16 17:00 db_2013-11-16_1700.sql
-rw-rw-r-- 1 ubuntu ubuntu 35857565 Nov 16 18:00 db_2013-11-16_1800.sql
-rw-rw-r-- 1 ubuntu ubuntu 35858254 Nov 16 19:00 db_2013-11-16_1900.sql
-rw-rw-r-- 1 ubuntu ubuntu 35860276 Nov 16 20:00 db_2013-11-16_2000.sql
-rw-rw-r-- 1 ubuntu ubuntu 35861583 Nov 16 21:00 db_2013-11-16_2100.sql
-rw-rw-r-- 1 ubuntu ubuntu 35863630 Nov 16 22:00 db_2013-11-16_2200.sql
-rw-rw-r-- 1 ubuntu ubuntu 35864868 Nov 16 23:00 db_2013-11-16_2300.sql
-rw-rw-r-- 1 ubuntu ubuntu 35866095 Nov 17 00:00 db_2013-11-17_0000.sql
-rw-rw-r-- 1 ubuntu ubuntu 35887731 Nov 17 01:00 db_2013-11-17_0100.sql
-rw-rw-r-- 1 ubuntu ubuntu 35888871 Nov 17 02:00 db_2013-11-17_0200.sql
-rw-rw-r-- 1 ubuntu ubuntu 35888871 Nov 17 03:00 db_2013-11-17_0300.sql
-rw-rw-r-- 1 ubuntu ubuntu 35889319 Nov 17 04:00 db_2013-11-17_0400.sql

These actually go on since September 2012! I need to delete all backups except for the last backup of each month. That is, these files should be left:

db_2012-09-30_2300.sql
db_2012-10-31_2300.sql
db_2012-11-30_2300.sql
db_2012-12-31_2300.sql
db_2013-01-31_2300.sql
db_2013-02-28_2300.sql
db_2013-03-31_2300.sql
db_2013-04-30_2300.sql
db_2013-05-30_2300.sql
db_2013-06-30_2300.sql
db_2013-07-31_2300.sql
db_2013-08-31_2300.sql
db_2013-09-30_2300.sql
db_2013-10-31_2300.sql
db_2013-11-20_0700.sql # Because this month has not finished yet!

I could write a Bash/Python script to create lists of each month, remove the last item from the list and then delete one-by-one the remaining files. Alternatively, the script could move the last files from each month to a temp directory, remove everything, then put the files back.

However I wonder if there is some way to simply tell rm (or rm with find and awk and sort) to ignore the last file of the month. Is there such a magic spell?

I do recognise that life would be easier if I could just save the first file from each month (which would be only a 1 hour difference from saving the last file from each month) but that is not acceptable to others in the organization who fail to see that this essentially provides the same protection.

Best Answer

With the file zz containing the list of file names, this works, so just replace cat zz.

cat zz | grep -vF -f <(cat zz|sort -r|uniq -w11)

e.g. echo *.sql | grep -vF -f <( echo *.sql | sort -r | uniq -w11 ) | xargs rm

As is, it won't work if spaces in file names, and very fragile to filename length.