Given the following data
file…
foo 10
bar 20
oof 50
rab 20
… how would I print column two as a percent of the total of column two? In other words, I want…
foo 10 10%
bar 20 20%
oof 50 50%
rab 20 20%
… with less obvious numbers of course. I can create a running total easily enough, but I'm not sure how I can calculate the total before printing the lines. I am doing this in an awk file totals.awk
…
#!/usr/bin/awk -f
BEGIN{
runningtotal=0
}
{
runningtotal=runningtotal+$2
print $1 "\t" $2 "\t" runningtotal "\t" $2/runningtotal
}
So, running ./totals.awk data
yields…
foo 10 10 1
bar 20 30 0.666667
oof 50 80 0.625
rab 20 100 0.2
Is there a way to loop twice, once to calculate the total, and once to print the lines? Is this possible in AWK, or must I use other utilities?
Best Answer
To create the table with a single call to
awk
:How it works
The file
data
is provided as an argument toawk
twice. Consequently, it will be read twice, the first time to get the total, which is stored in the variables
, and the second to print the output. Looking at the commands in more detail:FNR==NR{s+=$2;next;}
NR is the total number of records (lines) that
awk
has read and FNR is the number of records read so far from the current file. Consequently, whenFNR==NR
, we are reading the first file. When this happens, the variables
is incremented by the value in the second column. Then,next
tellsawk
to skip the rest of the commands and start over with the next record.Note that it is not necessary to initialize
s
to zero. Inawk
, all numeric variables are, by default, initialized to zero.printf "%s\t%s\t%s%%\n",$1,$2,100*$2/s
If we reach this command, then we are processing the second file. This means that
s
now holds the total of column 2. So, we print column 1, column 2, and the percentage,100*$2/s
.Output format options
With
printf
, detailed control of the output format is possible. The command above uses the%s
format specifier which works for strings, integers, and floats. Three other option that might be useful here are:%d
formats numbers as integers. If the number is actually floating point, it will be truncated to an integer%f
formats numbers as floating point. It is also possible to specify widths and decimals places as, for example,%5.2f
.%e
provides exponential notation. This would be useful if some numbers were exceptionally large or small.Make a shell function
If you are going to use this more than once, it is an inconvenience to type a long command. Instead create either a function or a script to hole the command.
To create a function called
totals
, run the command:With this function defined, the percentages for a data file called
data
can be found by running:To make the definition of
totals
permanent, place it in your~/.bashrc
file.Make a shell script
If you prefer a script, create a file called
totals.sh
with the contents:To get the percentages for a data file called
data
, run: