How to use the fmt command with non-latin characters

text formatting

I need to use fmt to format some text output in Greek, but it does not behave as it does with latin characters. Consider for example the sentences with 15 characters below.

With latin characters:

 $echo "Have a nice day" | fmt -w 16
 Have a nice day

but, strangely, with non-latin characters:

 $echo "Ηαωε α νιψε δαυ" | fmt -w 16
 Ηαωε α
 νιψε δαυ

In fact for the above string, the smallest value that it prints the sentence without line breaks would be -w 28:

 $echo "Ηαωε α νιψε δαυ" | fmt -w 28
 Ηαωε α νιψε δαυ
 $echo "Ηαωε α νιψε δαυ" | fmt -w 27
 Ηαωε α νιψε
 δαυ

Can somebody explain why this happens and how to fix it, if possible?

Best Answer

To answer your question, it is not working because Greek characters are non-Latin, Unicode characters, and:

Unlike par, fmt has no Unicode support, ...

https://en.wikipedia.org/wiki/Fmt

Additional notes

The second part of your question on how-to, unfortunately,

Although there seems be a fairly recent technical report regarding how to wrap Unicode, for example Heninger, Unicode Line Breaking Algorithm , 2015-06-01 http://www.unicode.org/reports/tr14/ however this seems to be specification only, no actual implementation or mention of software how-to examples. You could try asking the author via the email listed.

Since the Wikipedia article on fmt referred to par, and it was available via apt-get, I decided to try it on your posted text.

But I was unsuccessful, it still doesn't wrap the way you wish:

$ echo "Ηαωε α νιψε δαυ" | par 16gr
Ηαωε α
νιψε δαυ

The man page was difficult enough that even the author cautioned that it was: not well-written for the end-user, but if you are determined you could try your luck reading it.

Related Question