Adding leading zeros into date and time

awktext formattingtext processing

I have a csv containing the following data structure:

1111,2222,3333,4444,5555,6666,7777,2017-1-5 1:07:09,2017-1-5 1:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

I want to display the dates month and day as always being 2 digits long. I also want the times Hour field to always be 2 digits.

Essentially adding leading zeros if the month/day/hour fields are only a single digit as in the example line above.

Using awk, how would I go about achieving the following result:

1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

Best Answer

A great tool for text processing is awk. The following example is using plain standard awk on FreeBSD 11.1. @RomanPerekhrest has an elegant solution in another answer if you prefer GNU awk.

Your input is comma-separated. Because of this we invoke awk with the -F, parameter.

We can then print out columns using the print statement. $1 is the first column. $2 is the second column.

$ awk -F, '{ print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17

This gives us the 8th column for each row.

This is then the date field you want to manipulate. Rather than setting the delimiter using the command-line parameter we can do it as part of the script. FS for the input delimiter and OFS for the output delimiter.

$ awk 'BEGIN { FS = "," } ; { print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17

When working with dates I often prefer to use the date util to make sure I handle them correctly. And I do not need to worry if I am using regular or GNU awk. Furthermore I get a big fat failure if the date does not parse correctly.

The interesting parameter are:

-j     Specify we do not want to set the date at all
-f     The format string we use for input
+      The format string we use for output

So if we run this for one date:

$ date -j -f "%Y-%m-%d %H:%M:%S" +"%Y-%m-%d %H:%M:%S" "2017-1-5 1:07:09"
2017-01-05 01:07:09

We can then combine this with awk. Notice how the quotes are escaped. This is probably the biggest stumbling block for a beginner.

$ awk -F, '{ system("date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"")}' inputfile.csv
2017-01-05 01:07:09
2017-11-25 19:57:17

The system call seems correct - but unfortunately it only allows us to capture the returncode and it prints directly to the output. To avoid this we use the cmd | getline pattern. The following simple example will read the current date into mydate:

$ awk 'BEGIN { cmd = "date"; cmd | getline mydate; close(cmd); print mydate }'
Thu Mar  1 16:26:15 CET 2018

We use the BEGIN keyword as we have no input to this simple example.

So let us expand this:

awk 'BEGIN { FS=","; OFS=FS };
     { 
         cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";
         cmd | getline firstdate;
         close(cmd);
         cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";
         cmd | getline seconddate;
         close(cmd);
         print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate
     }' inputfile.csv

And we can collapse it to a one-liner:

awk 'BEGIN {FS=",";OFS=FS};{cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";cmd | getline firstdate;close(cmd);cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";cmd | getline seconddate;close(cmd);print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate}' inputfile.csv

Which gives me the output:

1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

Addendum

As the purpose here is to learn good habit I better update this answer. It is a bad habit to repeat code. When you start doing that you should split things into a function. As you will notice the code below immediately becomes more readable.

awk 'function convertdate(the_date) {
         cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""the_date"\"";
         cmd | getline formatted_date;
         close(cmd);
         return formatted_date
     }
     BEGIN { FS=","; OFS=FS };
     { 
         print $1,$2,$3,$4,$5,$6,$7,convertdate($8),convertdate($9)
     }' inputfile.csv

Make a habit of this and you will notice how much easier it will become to introduce error handling later on.

Related Question