Excel files can be converted to CSV using:
$ libreoffice --convert-to csv --headless --outdir dir file.xlsx
Everything appears to work just fine. The encoding, though, is set to something wonky. Instead of a UTF-8 mdash (—) that I get if I do a "save as" manually from LibreOffice Calc, it gives me a \227 (�). Using file on the CSV gives me "Non-ISO extended-ASCII text, with very long lines". So, two questions:
- What on earth is happening here?
- How do I tell libreoffice to convert to UTF-8?
The specific file that I'm trying to convert is here.
Best Answer
Apparently LibreOffice tries to use ISO-8859-1 by default, which is causing the problem. In response to this bug report, a new parameter
--infilter
has been added. The following command produces U+2014 em dash:I tested this with LO 5.0.3.2. From the bug report, it looks like the earliest version containing this option is LO 4.4.
See also: https://ask.libreoffice.org/en/question/13008/how-do-i-specify-an-input-character-coding-for-a-convert-to-command-line-usage/