AWK Scripting – Number Formatting and Rounding Issue with AWK

awkcommand linemawknumeric datatext formatting

I want to get the exact number when I try to find the average of a column of values.

For example, this is the column of input values:

1426044
1425486
1439480
1423677
1383676
1360088
1390745
1435123
1422970
1394461
1325896
1251248
1206005
1217057
1168298
1153022
1199310
1250162
1247917
1206836

When I use the following command:

... | awk '{ sum+=$1} END { print sum/NR}'

I get the following output: 1.31638e+06. However, I want the exact number, which is 1316375.05 or even better, in this format 1,316,375.05

How can I do this with command line tools only?

EDIT 1

I found the following one-liner awk command which will get me the max, min and mean:

awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %.2f\n", min, max, sum/NR}'

Why is it that NR must be initialized as 1? When I delete NR == 1, I get the wrong result.

EDIT 2

I found the following awk script from Is there a way to get the min, max, median, and average of a list of numbers in a single command?. It will get the sum, count, mean, median, max, and min values of a single column of numeric data, all in one go. It reads from stdin, and prints tab-separated columns of the output on a single line. I tweaked it a bit. I noticed that it does not need NR == 1 unlike the awk command above (in my first edit). Can someone please explain why? I think it has to do with the fact that the numeric data has been sorted and placed into an array.

#!/bin/sh

sort -n | awk '

  $1 ~ /^(\-)?[0-9]*(\.[0-9]*)?$/ {
    a[c++] = $1;
    sum += $1;
  }
  END {
    ave = sum / c;
    if( (c % 2) == 1 ) {
      median = a[ int(c/2) ];
    } else {
      median = ( a[c/2] + a[c/2-1] ) / 2;
    }

    {printf "Sum: %d\tCount: %d\tAverage: %.2f\tMedian: %d\tMin: %d\tMax: %d\n", sum, c, ave, median, a[0], a[c-1]}
  }
'

Best Answer

... | awk '{ sum+=$1} END { print sum/NR}'

By default, (GNU) awk prints numbers with up to 6 significant digits (plus the exponent part). This comes from the default value of the OFMT variable. It doesn't say that in the docs, but this only applies to non-integer valued numbers.

You could change OFMT to affect all print statements, or rather, just use printf here, so it also works if the average happens to be an integer. Something like %.3f would print the numbers with three digits after the decimal point.

...| awk '{ sum+=$1} END { printf "%.3f\n", sum/NR }'

See the docs for the meaning of the f and g, and the precision modifier (.prec in the second link):

awk 'NR == 1 { max=$1; min=$1; sum=0 } ...'

This doesn't initialize NR. Instead, it checks if NR is equal to one, i.e. we're on the first line. (== is comparison, = is assignment.) If so, initializes max, min and sum. Without that, max and min would start as zeroes. You could never have a negative maximum value, or a positive minimum value.

Related Question