Here is a streamable solution.
I assume you want to sort based on the first row of the columns, otherwise adapt to get the sorting key from somewhere else.
Generate sorting key (reusing Rush's array):
echo -e "2 1 3\n5 4 6\n8 7 9" > data
key=$(head -n1 data | tr -s ' ' | tr ' ' '\n' | cat -n \
| sort -k2 | sed 's/^ *\(.*\)\t.*/\1/')
$key
now holds:
2
1
3
Now use the key to sort columns:
cat data | awk -v key="$key" '
BEGIN { split(key, order, "\n") }
{
for(i in order) {
printf("%s ", $order[i])
}
printf("\n");
}'
Output:
1 2 3
4 5 6
7 8 9
Awk
this awk script will work on an arbitrary number of columns > 2 and order of appearance will be preserved as across then down with no assumptions made about what the columns are (i.e. doesn't matter if they are numeric or not, sorted or not, etc):
{
for (i = 2; i <= NF; i++) {
a[j + i] = $1 " " $i
}
j += (i - 1);
}
END {
OutNR = NR * NF;
for (i = 2; i <= NF; i++) {
for (j = 0; j < OutNR; j += NF) {
print a[j + i];
}
}
}
Given:
0 0 0 0.2340
0.05 9.6877884e-06 0.0024898597 0.2341
0.1 4.2838688e-05 0.0049595502 0.2342
0.15 0.00016929444 0.0074092494 0.2343
0.2 0.00036426881 0.009839138 0.2344
0.25 0.00055234582 0.012249394 0.2345
0.3 0.00077448576 0.014640196 0.2346
0.35 0.00082546537 0.017011717 0.2347
0.4 0.0012371619 0.019364133 0.2348
0.45 0.0013286382 0.02169761 0.2349
Order by column (2..n) then by line:
0 0
0.05 9.6877884e-06
0.1 4.2838688e-05
0.15 0.00016929444
0.2 0.00036426881
0.25 0.00055234582
0.3 0.00077448576
0.35 0.00082546537
0.4 0.0012371619
0.45 0.0013286382
0 0
0.05 0.0024898597
0.1 0.0049595502
0.15 0.0074092494
0.2 0.009839138
0.25 0.012249394
0.3 0.014640196
0.35 0.017011717
0.4 0.019364133
0.45 0.02169761
0 0.2340
0.05 0.2341
0.1 0.2342
0.15 0.2343
0.2 0.2344
0.25 0.2345
0.3 0.2346
0.35 0.2347
0.4 0.2348
0.45 0.2349
R
Although most people don't think of R for text processing, in this case, it's actually a bit more straight-forward, although all of the option setting makes it appear to be more complex than it really is. The essence of this solution is to simply rbind()
multiple cbind()
:
d.in <- read.table(file = commandArgs(trailingOnly = T)[1]
, colClasses = "character");
d.out<-data.frame();
for (i in 2:length(d.in)) {
d.out <- rbind(d.out, cbind(d.in[,1], d.in[,i]));
}
write.table(d.out, row.names = F, col.names = F, quote = F);
Then, just:
$ Rscript script.R data.txt
0 0
0.05 9.6877884e-06
0.1 4.2838688e-05
0.15 0.00016929444
0.2 0.00036426881
0.25 0.00055234582
0.3 0.00077448576
0.35 0.00082546537
0.4 0.0012371619
0.45 0.0013286382
0 0
0.05 0.0024898597
0.1 0.0049595502
0.15 0.0074092494
0.2 0.009839138
0.25 0.012249394
0.3 0.014640196
0.35 0.017011717
0.4 0.019364133
0.45 0.02169761
0 0.2340
0.05 0.2341
0.1 0.2342
0.15 0.2343
0.2 0.2344
0.25 0.2345
0.3 0.2346
0.35 0.2347
0.4 0.2348
0.45 0.2349
Best Answer
You can count the unique columns with following pipe:
The awk command transposes your input, the resulting lines are sorted, only unique lines are kept (
-u
) and at the end all (unique) lines (i.e. the transposed columns) are counted (wc -l
).Note that
NF
is a builtin awk variable and is automatically set to the number of fields in the current record.$i
references the i-th field andEND
guards the following block such that it is executed after all records are processed. Awk uses by default blank-non-blank field delimiting.