Batch rename pdf files by content

command linerename

I have a big pile of pdf's and I would like to batch rename them by content. (They are all searchable). And I'd like to do so using command line interface.

They are all payslips so they have a constant form, and I'd like to rename by date.

Currently they are named: payslip100 .. payslip308

The string for renaming would be the date component in
Payment Date: 15/4/2016

I have installed pdfgrep using home-brew and am searching using
pdfgrep -HC 15 "Payment Date:" paySlip.pdf

which returns paySlip.pdf:Payment Date: 8/7/2016


I have attached my final working code in a reply.

Best Answer

After some efforts I have come to a useful result !! sed syntax is quite confusing and I am quite happy to exist in a state where it works without knowing quite why.

#!/bin/bash
for file in *.pdf
do
    # return file name and date in form "Payment Date:   8/7/2016"
    date=$(pdfgrep -C 15 "Payment Date:" "$file")
    echo $date

    # now replace / with - to make naming file easier
    date2=$(echo "$date" | sed 's,/,-,g')
    echo $date2

    # use date string to rename : YYYY-mm-payslip-dd-mm-YYYY.pdf
    new=$(echo "$date2" | sed 's,\Payment Date:\ *\(.*\)-\(.*\)-\(.*\),\3-\2-\payslip-\1-\2-\3.pdf,')
    echo mv "$file" "$new"
done

I used pdfgrep which was installed using homebrew (found on another answer I can't find right now).

I needed to change "/" for file name. Using the forward slash as sed syntax is not necessary, and can be replaced by other characters. Hence using "," instead of "/" https://stackoverflow.com/questions/17379293/replace-forward-slash-with-double-backslash-enclosed-in-double-quotes

I found there were variable spaces in $date which necessitated the ..Payment Date:\ *\...

I added year and month to start of file name for organising purposes.