LibreOffice – How to Specify Encoding with –convert-to CSV

character encodingconversionlibreofficeunicode

Excel files can be converted to CSV using:

$ libreoffice --convert-to csv --headless --outdir dir file.xlsx

Everything appears to work just fine. The encoding, though, is set to something wonky. Instead of a UTF-8 mdash (—) that I get if I do a "save as" manually from LibreOffice Calc, it gives me a \227 (�). Using file on the CSV gives me "Non-ISO extended-ASCII text, with very long lines". So, two questions:

  1. What on earth is happening here?
  2. How do I tell libreoffice to convert to UTF-8?

The specific file that I'm trying to convert is here.

Best Answer

Apparently LibreOffice tries to use ISO-8859-1 by default, which is causing the problem. In response to this bug report, a new parameter --infilter has been added. The following command produces U+2014 em dash:

libreoffice  --convert-to csv --infilter=CSV:44,34,76,1 --headless --outdir dir file.xlsx

I tested this with LO 5.0.3.2. From the bug report, it looks like the earliest version containing this option is LO 4.4.

See also: https://ask.libreoffice.org/en/question/13008/how-do-i-specify-an-input-character-coding-for-a-convert-to-command-line-usage/

Related Question