Sorting with Perl respecting Locale Settings

localeperlsort

Following data needs to be sorted respecting the locale sort order:

wird
sin
wär
pêche
war
Über
Uber
péché
peach

There was nor problem by using sort

$ sort < data
peach
pêche
péché
sin
Uber
Über
war
wär
wird

which respects the locale, and

$ LC_ALL=C sort < data
Uber
peach
péché
pêche
sin
war
wird
wär
Über

without locale.

Now I tried to do so with perl, but I'm failed:

$ perl -e 'local $/ = undef; print sort <>;' < data
Über
pêche
war
péché
sin
Uber
peach
wär
wird

The result is either the first output of sort, nor the second.

Running Ubuntu 12.04 LTS

Best Answer

The problem is local $/ = undef. It causes perl to read entire file in to @ARGV array, meaning it contains only one element, so sort can not sort it (because you are sorting an array with only one element). I expect the output must be the same with your beginning data (I also use Ubuntu 12.04 LTS, perl version 5.14.2:

$ perl -le 'local $/ = undef;print ++$i for <>' < cat
1

$ perl -le 'print ++$i for <>' < cat
1
2
3
4
5
6
7
8
9

If you remove local $/ = undef, perl sort will proceduce same output with the shell sort with LC_ALL=C:

$ perl -e 'print sort <>' < data
Uber
peach
péché
pêche
sin
war
wird
wär
Über

Note

Without use locale, perl ignores your current locale settings. Perl comparison operators ("lt", "le", "cmp", "ge", and "gt") use LC_COLLATE (when LC_ALL absented), and sort is also effected because it use cmp by default.

You can get current LC_COLLATE value:

$ perl -MPOSIX=setlocale -le 'print setlocale(LC_COLLATE)'
en_US.UTF-8
Related Question