Use OpenOffice from command line to convert HTML to RTF

bashconversioncygwin;openoffice

I'm trying to build a bash script in Cygwin that will convert HTML files to RTF. In OS X this is trivial with textutils, but that doesn't exist for regular Linux or Cygwin. Instead I'm trying to use OpenOffice from the command line.

I've read elsewhere that OpenOffice can run headlessly with a program normally installed as /usr/bin/ooffice, but in Cygwin under Windows this obviously doesn't work—the OpenOffice installer doesn't built native Cygwin symlinks and might not even install the Windows equivalent of ooffice.

How can I use OpenOffice from the command line in Cygwin to convert HTML files to RTF files?

Best Answer

There is a really handy shell script called unoconv that handles conversion of any files from and to any file format that OpenOffice/LibreOffice supports. You can read up about it on its site and be sure to check out the man page. Many distros have packages for it that you can install easily, including, I believe, cygwin.

Once you have it installed, usage in your case would mean specifying an input html file and an output rtf file like this:

unoconv file.html file.rtf

All done :)

Of course this could be scripted to handle multiple file situations as well. If you are using zsh, you could run something like this to convert a whole folder of html files:

for file in *html; do
    unoconv "$file" "${file/html/rtf}"
done