I'm trying to numerically sort every column individually in a very large file. I need the command to be fast, so I'm trying to do it in an awk command.
Example Input:
1,4,2,7,4
9,2,1,1,1
3,9,9,2,2
5,7,7,8,8
Example Output:
1,2,1,1,1
3,4,2,2,2
5,7,7,7,4
9,9,9,8,8
I made something that will do the job (but its not the powerful awk command I need):
for i in $(seq $NumberOfColumns); do
SortedMatrix=$(paste <(echo "$SortedMatrix") <(awk -F ',' -v x=$i '{print $x}' File | sort -nr) -d ,)
done
but it is very slow!
I've tried to do it in awk and I think I'm close:
SortedMatrix=$(awk -F ',' 'NR==FNR {for (i=1;i<=NF;i++) print|"sort -nr"}' File)
But it doesn't output columns (just one very long column), I understand why its doing this but I don't know how to resolve it, I was thinking of using paste inside awk but I have no idea how to implement it.
Does anyone know how to do this in awk? Any help or guidance will be much appreciated
Best Answer
You can do it in a single GNU awk:
for(i=1;i<=NF;i++){matrix[i][NR]=$i}
Multidimensional array (GNU extension)
matrix
gets populated, so thatmatrix[i][j]
contains the number of columni
, rowj
.for(i=1;i<=NF;i++){asort(matrix[i])}
Sorts each column (GNU extension).
Finally
Prints a sequence of
a[1],
,a[2],
, ...,a[NF-1],
,a[NF]\n
for each line.