Using awk :
The input file
$ cat FILE
Id ht
510 69
510 67
510 65
510 62
510 59
601 29
601 26
601 21
601 20
Awk in a shell :
$ awk '
NR>1{
arr[$1] += $2
count[$1] += 1
}
END{
for (a in arr) {
print "id avg " a " = " arr[a] / count[a]
}
}
' FILE
Or with Perl in a shell :
$ perl -lane '
END {
foreach my $key (keys(%hash)) {
print "id avg $key = " . $hash{$key} / $count{$key};
}
}
if ($. > 1) {
$hash{$F[0]} += $F[1];
$count{$F[0]} += 1;
}
' FILE
Output is :
id avg 601 = 24
id avg 510 = 64.4
And last for the joke, a Perl dark-obfuscated one-liner =)
perl -lane'END{for(keys(%h)){print"$_:".$h{$_}/$c{$_}}}($.>1)&&do{$h{$F[0]}+=$F[1];$c{$F[0]}++}' FILE
If column-order is important, i.e. numbers from the same file should be kept in the same column, you need to add padding while reading the different files. Here is one way that works with GNU awk:
merge.awk
# Set k to be a shorthand for the key
{ k = $1 SUBSEP $2 }
# First element with this key, add zeros to align it with other rows
!(k in h) {
for(i=1; i<=ARGIND-1; i++)
h[k] = h[k] OFS 0
}
# Remember the data element
{ h[k] = h[k] OFS $3 }
# Before moving to the next file, ensure that all rows are aligned
ENDFILE {
for(k in h) {
if(split(h[k], a) < ARGIND)
h[k] = h[k] OFS 0
}
}
# Print out the collected data
END {
for(k in h) {
split(k, a, SUBSEP)
print a[1], a[2], h[k]
}
}
Here are some test files: f1
, f2
, f3
and f4
:
$ tail -n+1 f[1-4]
==> f1 <==
xyz desc1 21
uvw desc2 22
pqr desc3 23
==> f2 <==
xyz desc1 56
uvw desc2 57
==> f3 <==
xyz desc1 87
uvw desc2 88
==> f4 <==
xyz desc1 11
uvw desc2 12
pqr desc3 13
stw desc1 14
arg desc2 15
Test 1
awk -f merge.awk f[1-4] | column -t
Output:
pqr desc3 23 0 0 13
uvw desc2 22 57 88 12
stw desc1 0 0 0 14
arg desc2 0 0 0 15
xyz desc1 21 56 87 11
Test 2
awk -f merge.awk f2 f3 f4 f1 | column -t
Output:
pqr desc3 0 0 13 23
uvw desc2 57 88 12 22
stw desc1 0 0 14 0
arg desc2 0 0 15 0
xyz desc1 56 87 11 21
Edit:
If the output should be tab-separated, set the output field separator accordingly:
awk -f merge.awk OFS='\t' f[1-4]
Best Answer
This
perl
version can also deal with arbitrary field lengths, not only those of a single character:On a more complex file:
Explanation
The
-l
trims newlines from each input line, the-a
splits input fields on whitespace into the array@F
and the-p
prints each input line after applying the script given by-e
.The script itself iterates over each input field (the
@F
array), saving each as$i
. The substitution looks for 2 or more consecutive$i
followed by 0 or more spaces and replaces them with$i+
.