AWK Numerical Sorting – How to Sort Each Column Individually

awknumeric datatext processing

I'm trying to numerically sort every column individually in a very large file. I need the command to be fast, so I'm trying to do it in an awk command.

Example Input:

1,4,2,7,4
9,2,1,1,1
3,9,9,2,2
5,7,7,8,8

Example Output:

1,2,1,1,1
3,4,2,2,2
5,7,7,7,4
9,9,9,8,8

I made something that will do the job (but its not the powerful awk command I need):

for i in $(seq $NumberOfColumns); do 
  SortedMatrix=$(paste <(echo "$SortedMatrix") <(awk -F ',' -v x=$i '{print $x}' File | sort -nr) -d ,)
done

but it is very slow!
I've tried to do it in awk and I think I'm close:

SortedMatrix=$(awk -F ',' 'NR==FNR {for (i=1;i<=NF;i++) print|"sort -nr"}' File)

But it doesn't output columns (just one very long column), I understand why its doing this but I don't know how to resolve it, I was thinking of using paste inside awk but I have no idea how to implement it.

Does anyone know how to do this in awk? Any help or guidance will be much appreciated

Best Answer

You can do it in a single GNU awk:

gawk -F ',' '
    {
        for(i=1;i<=NF;i++){matrix[i][NR]=$i}
    }
    END{
        for(i=1;i<=NF;i++){asort(matrix[i])}
        for(j=1;j<=NR;j++){
            for(i=1;i<NF;i++){
                printf "%s,",matrix[i][j]
            }
            print matrix[i][j]
        }
    }
' file
  • for(i=1;i<=NF;i++){matrix[i][NR]=$i}

Multidimensional array (GNU extension) matrix gets populated, so that matrix[i][j] contains the number of column i, row j.

  • for(i=1;i<=NF;i++){asort(matrix[i])}

Sorts each column (GNU extension).

  • Finally

    for(j=1;j<=NR;j++){
        for(i=1;i<NF;i++){
            printf "%s,",matrix[i][j]
        }
        print matrix[i][j]
    }
    

Prints a sequence of a[1],, a[2],, ..., a[NF-1],, a[NF]\n for each line.

Related Question