Sorting is not consistent using the Unix command ‘sort’

sortingunix

I'm running the command:

zcat [File] | sed "1d" | sort -t $'\xE7' -k [field to be sorted] > [file].sorted

When I run this on File A, sorting on field 1, I get the following result:

11622400 , abe, def
11622401 , abe, def
11622402 , bbabe, def
11622403 , ddabe, def
11622404 , acdc, dere
11622405 , ddabe, bere
11622406 , abe, fgh
11622407 , adbed, ddee
11622408 , adbe, def
11622409 , abdde, def
1162240 , abe, deed
11622410, def,dede

But when I run the same command on the file 2 sorting on field 2, I get this:

1162303, 116224
1162420, 1162240
11623062, 11622400
11623063, 11622401
11623064, 11622402
11623065, 11622403
11623066, 11622404
11623067, 11622405
11623068, 11622406
11623069, 11622407
11623070, 11622408
11623071, 11622409
1162421, 1162241
11623072, 1162410

Why is it not sorting in the same way? The first example looks wrong, the second line from the bottom should be at the top.

I'm trying to join these files with the Unix join command, but because these are not ordering in the same way, this is missing out lots of records.

What is the reason for this problem?

Best Answer

The reason you're getting these results is that your sort is not numeric, it is based upon canonical values of the columns.

There is a command line switch to sort that will sort numerically, this is what you want (type 'man sort' in your google bar)

Related Question