Knowing that "How to convert from text to .pdf" is already well answered here link and here link, I am looking for something more specific:

Using Claws-Mail [website] and a Plug-In [RSSyl] to read RSS feeds I collected a lot of text files. These I want to convert into .pdf files.

Problem: The files inside the folders are numbered [1, 2, …, 456]. Every feed has its own folder, but inside I have 'just' numbered files. Every file contains a header [followed by the message's content]:

Date: Tue,  5 Feb 2013 19:59:53 GMT
From: N/A
Subject: Civilized Discourse Construction Kit
Message-ID: <>
Content-Type: text/html; charset=UTF-8

<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<base href="">
<p>URL: <a href=""></a></p>
<!-- RSSyl text start -->

Question: A way to convert each file into a .pdf file and rename it, based upon the name given under Subject. Super-awesome would be converting and re-naming this way:

""_"date"_"file name" with each information taken from the header data. As there are a few hundred files, I am looking for a batch processing way.

The files are html formatted, but without a .htm[l] suffix.

If you have a relatively simple file tree where you have only one level of directories, and where each directory contains a list of files but there are no sub directories, you should be able to do something like this (you can paste this directly into your terminal and hit Enter):

for dir in *; do    ## For each directory
 if [ "$(ls -A "$dir")" ]; then  ## If the dir is not empty
   for file in "$dir"/*; do      ## For each file in $dir
    i=0;                         ## initialize a counter
    ## Get the subject
    sub=$(grep ^Subject: "$file" | cut -d ':' -f 2-);
    ## get the date, and format it to MMDDYY_Hour:Min:Sec
    date=$(date -d "$(grep ^Date: $file | cut -d ':' -f 2-)" +%m%d%y_%H:%M:%S);
    ## the pdf's name will be <directory's name> _ <date> _ <subject>
    ## if a file of this name exists
    while [ -e "$dir/$name".pdf ]; do
      let i++;                       ## increment the counter
      name="$dir"_"$date"_"$sub"$i;  ## append it to the pdf's name
    wkhtmltopdf "$file" "$dir"/"$name".pdf; ## convert html to pdf


  • This solution requires wkhtmltopdf:

    Simple shell utility to convert html to pdf using the webkit rendering engine, and qt.

    On Debian based systems you can install it with

    sudo apt-get install wkhtmltopdf
  • It assumes there are no files in the top level directory and only desired html files in all sub directories.

  • It can deal with file and directory names that contain spaces, new lines and other unorthodox characters.

  • Given a file dir1/foo with the contents of the example you have posted, it will create a file called dir1/dir1_020513_20:59:53_Civilized Discourse Construction Kit10.pdf

