How to sort columns based on the first line

awktext processing

I need to sort the columns of a very big dataset (1000 lines and 700000 columns).
As an example, my columns are randomly arranged like: col1 col4 col3 col2, and I need to sort that.

I have been trying some commands, but no success.

example:

ID M2 M5 M8 M1 M3 M9 .....M7000000
Animal1 1 0 2 1 0 2 .....1
Animal2 0 1 2 0 1 1 .....0
Animal3 2 1 0 1 2 1 .....0
.
.
.
.
Animaln

In this example, dots means that I have a lot of columns and lines. Again, I need to sort the columns to be like:

ID M1 M2 M3 M4 M5 M6 .....M7000000
Animal1 1 0 2 1 0 2 .....1
Animal2 0 1 2 0 1 1 .....0
Animal3 2 1 0 1 2 1 .....0
.
.
.
.
Animaln

Thank you

Best Answer

With GNU datamash and GNU sort:

datamash transpose -t ' ' -H <file_in.csv | sort -V | datamash transpose -t ' ' -H >file_out.csv

This works fine for "reasonably small" data. It may or may not work with your file.

Edit: The solutions below without transpositions should be less resource-intensive.

Related Question