Invoke date within an awk command to format output

awkcommand linedate

I have a csv file in the following format:

20171129,1
20171201,0.5
20171201,0.5
20171202,1.25
20171202,1.75

I use the following command to sum up the second field if it is following the same date with this command:

awk -F ',' '{a[$1] += $2} END{for (i in a) print "On " i, "you spend: "a[i] " hour(s)"}' << "file.csv"

the output I get looks like this:

On 20171129 you spend: 1 hour(s)
On 20171201 you spend: 1 hour(s)
On 20171202 you spend: 3 hour(s)

what I want to achieve now is format the date as if I would with:

awk -F ',' '{a[$1]} END{for (i in a) print i}' << "file.csv" \
 | date +"%a, %d.%m.%Y" -f -

# prints:
Wed, 29.11.2017
Fri, 01.12.2017
Sat, 02.12.2017

so my final result would look like:

On Wed, 29.11.2017 you spend: 1 hour(s)
On Fri, 01.12.2017 you spend: 1 hour(s)
On Sat, 02.12.2017 you spend: 3 hour(s)

is it possible to invoke date within the awk command to format the output?

Best Answer

You could use gawk which has the strftime and mktime functions (https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html).

gawk -F ',' '{a[$1] += $2} END{for (i in a) print "On " strftime("%a, %d.%m.%Y", mktime( substr(i,1,4) " " substr(i,5,2) " " substr(i,7,2) " 0 0 0" )) " you spend: "a[i] " hour(s)" }' files.csv

In more details:

gawk -F ',' '
  {
    a[$1] += $2
  }
  END{
    for (i in a) {

      # mktime needs a date formated like this "2017 12 31 23 59 59"
      # strftime needs a Unix timestamp (produced by mktime)

      print "On " strftime("%a, %d.%m.%Y", mktime( substr(i,1,4) " " substr(i,5,2) " " substr(i,7,2) " 0 0 0" )) " you spend: "a[i] " hour(s)"

    }
  }' files.csv

With a basic awk, you need to call the command and read its result with getline:

awk -F ',' '{a[$1] += $2} END{ for (i in a) { COMMAND = "date +\"%a, %d.%m.%Y\" -d " i ; if ( ( COMMAND | getline DATE ) 0 ) { print "On " DATE " you spend: "a[i] " hour(s)" } ; close(COMMAND) } }' files.csv

In more details:

awk -F ',' '
  {
    a[$1] += $2
  }
  END{
    for (i in a) {

      # Define command to call

      COMMAND = "date +\"%a, %d.%m.%Y\" -d " i

      # Call command and check that it prints something
      # We put the 1st line of text displayed by the command in DATE

      if ( ( COMMAND | getline DATE ) > 0 ) {
        # Print the result
        print "On " DATE " you spend: "a[i] " hour(s)"
      }

      # Close the command (important!)
      # Your child process is still here if you do not close it

      close(COMMAND)
    }
  }' files.csv

Related Solutions

Converting date timestamp from 12 hour into 24 using awk

I'd use perl here:

perl -pe 's{\b(\d{1,2})(:\d\d:\d\d) ([AP])M\b}{
  $1 + 12 * (($3 eq "P") - ($1 == 12)) . $2}ge'

That is add 12 to the hour part if PM (except for 12PM) and change 12AM to 0.

With awk, not doing the word-boundary part (so could give false positives on 123:21:99 AMERICA for instance) and assuming there's only one occurrence per line:

awk '
  match($0, /[0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M/) {
    split(substr($0, RSTART, RLENGTH), parts, /[: ]/)
    if (parts[4] == "PM" && parts[1] != 12) parts[1] += 12
    if (parts[4] == "AM" && parts[1] == 12) parts[1] = 0

    $0 = substr($0, 1, RSTART - 1) \
         parts[1] ":" parts[2] ":" parts[3] \
         substr($0, RSTART + RLENGTH)
  }
  {print}'

Suppress/remove carriage return from awk output

You should remove the -t option from ssh in order to prevent generating the carriage return in the first place.

The -t option directs ssh to allocate a pseudo terminal on the remote machine, and if that terminal has the onlcr flag set (which is the default), every LF (\n) will be translated to CR/LF (\r\n) on output. The -t option is not needed unless a full-screen and/or interactive program is run, like vi or screen.

But if you really have to process lines terminated by CR/LF in awk, you can set the record separator to CR/LF in any awk implementation which supports multi-character/regex record separators -- like gawk or mawk. Example:

awk ... 'BEGIN{RS="\r\n"}{...}'

Also, if you want to remove a stray CR from a field in awk, you can use sub or gsub (which works everywhere):

gsub("\r","",$6)

Best Answer

Related Solutions

Converting date timestamp from 12 hour into 24 using awk

Suppress/remove carriage return from awk output

Related Question