Bash – Archive files by month

aixbash

TASK

I need to write a bash script that would archivd files by month. That is, files with modification dates of 2016-12 would be packed into the archive 2016-12_archive.tar.gz, and files with modification dates of 2017-01 would be packed into 2017-01_archive.tar.gz archive, etc.

Example:

FILE NAME   MODIFICATION DATE
file1.log       2016-12-30       ----> 2016-12_archive.tar.gz
file2.log       2016-12-31       ----> 2016-12_archive.tar.gz
file3.log       2017-01-01       ----> 2017-01_archive.tar.gz
file4.log       2017-01-02       ----> 2017-01_archive.tar.gz
file5.log       2017-01-15       ----> 2017-01_archive.tar.gz

My difficulties

The main problems I have:

  1. How to get the file modification date via bash?

  2. How to process all the files in a given directory (so that they have been archived in an appropriate archive)?

My attempts to solve the problems

  1. I have found two ways to find out the file modification date:
    date -r $ file +% F and find dir -name filename -printf '% TY-% Tm-% Td \ n'.
    They both do not work on the computer (OS AIX, I'm not a root).
    Also I can't get what command 'ls -lc` shows (it doesn't seem to modification date).

  2. I have only one idea: to get all modification file dates in the YYYY-MM format, and then create a list of their unique values. Then for each item in this list find all files with appropriate modification date.

Consolidated attempts

Using istat to get modification date:

$ istat filename
Inode 86741 on device 10/8 File
Protection: rw-r-----
Owner: 6361(user2) Group: 621(norgroup)
Link count: 1 Length 116 bytes

Last updated: 16 февраля 2017 г., 14:25:11 MSK
Last modified: 16 февраля 2017 г., 14:25:11 MSK
Last accessed: 16 февраля 2017 г., 16:08:46 MSK

This is how I can get Last modified value for each file:

for logFile in *.log; do
   mdfDate=$(istat $logFile | grep "Last modified");
   echo $logFile $mdfDate
done

Output:

file1.log Last modified: 30 декабря 2016 г., 14:25:11 MSK
file2.log Last modified: 31 декабря 2016 г., 14:26:11 MSK
file3.log Last modified: 01 января 2017 г., 14:27:11 MSK
file4.log Last modified: 02 января 2017 г., 14:28:11 MSK
file5.log Last modified: 15 января 2017 г., 14:29:11 MSK

So the next step is to extract date in unix format.

For some reason cut doesn't work correct. Awk is too heavy and sophisticated. Maybe sed?

Best Answer

If you had access to GNU date, this would be much easier. As it is, it really would be simpler to just use a more sophisticated language. For example, Perl:

#!/usr/bin/perl -w
use strict;
use POSIX qw(strftime);

my $targetDir = $ARGV[0] || ".";
my %tarFiles;
open(my $input, '-|', "find \"$targetDir\" -type f -name '*.log'");
while (<$input>) {
    # remove trailing newlines
    chomp;
    ## Get the file name
    my $file = $_;
    # Open it as a file handle for stat()
    open(my $fh, '<', "$file") or die;
    # Get the file's stats
    my @stats = stat($fh);
    close($fh);
    # modification time
    my $mtime = $stats[9];
    # Convert to YYYY-MM and build the tar file name
    my $tarfile = strftime "%Y-%m_archive.tar.gz", localtime($mtime);
    # Add to the list of files for this tar file
    push @{$tarFiles{$tarfile}}, qq("$file");
}

for my $tarFile (keys(%tarFiles)) {
    # Build the command that creates the tar file
    my $tarCom = "tar cvzf $tarFile @{$tarFiles{$tarFile}}";
    print "COMMAND: $tarCom\n";

    # Uncomment this line to run the command
    # system("$tarCom")
}                           

Save the script as makeTars.pl (or whatever you like) somewhere in your $PATH, make it executable (chmod +x /path/to/makeTars.pl) and then run like this:

makeTars.pl /path/to/target/dir

For example:

$ ls -l
total 0
-rw-r--r-- 1 terdon terdon 0 Dec 30 00:00  file1.log
-rw-r--r-- 1 terdon terdon 0 Dec 31 00:00  file2.log
-rw-r--r-- 1 terdon terdon 0 Jan  1  2016  file3.log
-rw-r--r-- 1 terdon terdon 0 Jan  2  2016  file4.log
-rw-r--r-- 1 terdon terdon 0 Jan  3  2016  file5.log
-rw-r--r-- 1 terdon terdon 0 Jan  3  2016 'file5 with spaces.log'
$ makeTars.pl .
COMMAND: tar cvzf 2017-02_archive.tar.gz "."
COMMAND: tar cvzf 2016-12_archive.tar.gz "./file2.log" "./file1.log"
COMMAND: tar cvzf 2016-01_archive.tar.gz "./file5 with spaces.log" "./file5.log" "./file4.log" "./file3.log"

Once you're satisfied that it will do what you want, uncomment the last line (system("$tarCom")) to make it actually create the tar files.

NOTE that this will break if your file names contain newlines, but I hope that will not be a problem with log files.

Related Question