Linux – How to correctly format the output with Awk printf command

awklinuxtext processing

I have the following file:

echo filename
    dfT08r352|30.5|2010/06/01|2016/08/29|2281|6.24503764544832|74.9404517453799|
    zm00dr121|37|2008/03/05|2011/09/12|1285.95833333333|3.52076203513575|42.249144421629|
    ccvd00121|41.6|2008/03/05|2012/03/05|1461|4|48|
    sddf00121|39.6|2008/03/05|2012/09/10|1649.95833333333|4.51733972165184|54.208076659822|
    fttt00121|41|2008/03/05|2013/09/16|2020.95833333333|5.53308236367785|66.3969883641342|
    ghhyy0121|42.2|2008/03/05|2014/03/18|2203.95833333333|6.03410905772302|72.4093086926762|

I am trying to format this file using awk printf to have the following desired format:

  1. keep the same order of fields (left–>right)
  2. have comma ", " FS
  3. only for the last three fields ($5, $6, $7) having all the
    numbers to be 4 digits, if less have a leading zero and only 2
    digits after the point like 0123.12 or 1234.10

I wrote the following awk command

awk -F"|" '{print $1","$2","$3","$4}{format = "%04.2f,%04.2f,%04.2f,"}{printf format, $5,$6,$7}' filename

however the below output has the following issues:

  1. is not in order (left–>right)

  2. do not have the leading zero

    dfT08r352,30.5,2010/06/01,2016/08/29
    2281.00,6.25,74.94,zm00dr121,37,2008/03/05,2011/09/12
    1285.96,3.52,42.25,ccvd00121,41.6,2008/03/05,2012/03/05
    1461.00,4.00,48.00,sddf00121,39.6,2008/03/05,2012/09/10
    1649.96,4.52,54.21,fttt00121,41,2008/03/05,2013/09/16
    2020.96,5.53,66.40,ghhyy0121,42.2,2008/03/05,2014/03/18
    

Can someone please let me know what is my mistake and how to fix it?

Best Answer

You have the fields in the right order, but your first print statement adds a newline (Output Record Separator), so your data's there, but just wrapped unexpectedly.

The second issue is that you're telling printf to use a width of 4; that includes the decimal point and the two digits after it, leaving only one for the leading digit and none for any padding. Try using 5 as the width, so that your data is padded up to four total numbers. If you want 4 digits before the decimal point, then change the width to 7 instead.

This is the shortest change I made from your program to something that outputs what I think you want:

awk -F"|" '{
  format = "%05.2f,%05.2f,%05.2f"; 
  print $1","$2","$3","$4"," sprintf(format, $5,$6,$7)}' filename

I combined multiple { } blocks into one, and also combined the print statements into one.

If I was to write your awk statement from scratch, I might do something like this:

awk -v FS=\| -v OFS=, '{
  $5=sprintf("%05.2f", $5); 
  $6=sprintf("%05.2f", $6); 
  $7=sprintf("%05.2f", $7); 
  print $1,$2,$3,$4,$5,$6,$7}' filename

It explicitly sets the input Field Separator, the Output Field Separator, explicitly converts each of the fields on its own, then prints the desired fields, with the OFS separating them.

Related Question