Text Processing AWK Perl – How to Average Rows with Same First Column

awkperltext processing

Given a file with two columns:

I need a way to coalesce all rows with the same ID into one that has an average height. In this case, (69 + 67 + 65 + 62 + 59) / 5 = 64 and (29 + 26 + 21 + 20) / 4 = 24, so the output should be:

Id  Avg.ht
 510 64
 601 24

How can I do that using sed/awk/perl?

Best Answer

Using awk :

The input file

Awk in a shell :

$ awk '
    NR>1{
        arr[$1]   += $2
        count[$1] += 1
    }
    END{
        for (a in arr) {
            print "id avg " a " = " arr[a] / count[a]
        }
    }
' FILE

Or with Perl in a shell :

$ perl -lane '
    END {
        foreach my $key (keys(%hash)) {
            print "id avg $key = " . $hash{$key} / $count{$key};
        }
    }
    if ($. > 1) {
        $hash{$F[0]}  += $F[1];
        $count{$F[0]} += 1;
    }
' FILE

Output is :

id avg 601 = 24
id avg 510 = 64.4

And last for the joke, a Perl dark-obfuscated one-liner =)

perl -lane'END{for(keys(%h)){print"$_:".$h{$_}/$c{$_}}}($.>1)&&do{$h{$F[0]}+=$F[1];$c{$F[0]}++}' FILE

Related Solutions

Text Processing – How to Numerical Sort by Last Column

The following command line uses awk to prepend the last field of each line of file.txt, does a reverse numerical sort, then uses cut to remove the added field:

awk '{print $NF,$0}' file.txt | sort -nr | cut -f2- -d' '

AWK – Merge 2 Rows Based on Same Column Values

A perl solution:

$ perl -ane '$h{$F[2]} .= " ".$F[0]." ".$F[1];
    END {
        for $k (sort keys %h) {
            print $_," " for grep {!$seen{$_}++} split(" ",$h{$k});
            print "$k\n";
        }
    }' file

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

Best Answer

Related Solutions

Text Processing – How to Numerical Sort by Last Column

AWK – Merge 2 Rows Based on Same Column Values

Related Question