A great tool for text processing is awk. The following example is using plain standard awk on FreeBSD 11.1. @RomanPerekhrest has an elegant solution in another answer if you prefer GNU awk.
Your input is comma-separated. Because of this we invoke awk
with the -F,
parameter.
We can then print out columns using the print
statement. $1
is the first column. $2
is the second column.
$ awk -F, '{ print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17
This gives us the 8th column for each row.
This is then the date field you want to manipulate. Rather than setting the delimiter using the command-line parameter we can do it as part of the script. FS for the input delimiter and OFS for the output delimiter.
$ awk 'BEGIN { FS = "," } ; { print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17
When working with dates I often prefer to use the date
util to make sure I handle them correctly. And I do not need to worry if I am using regular or GNU awk. Furthermore I get a big fat failure if the date does not parse correctly.
The interesting parameter are:
-j Specify we do not want to set the date at all
-f The format string we use for input
+ The format string we use for output
So if we run this for one date:
$ date -j -f "%Y-%m-%d %H:%M:%S" +"%Y-%m-%d %H:%M:%S" "2017-1-5 1:07:09"
2017-01-05 01:07:09
We can then combine this with awk. Notice how the quotes are escaped. This is probably the biggest stumbling block for a beginner.
$ awk -F, '{ system("date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"")}' inputfile.csv
2017-01-05 01:07:09
2017-11-25 19:57:17
The system call seems correct - but unfortunately it only allows us to capture the returncode and it prints directly to the output. To avoid this we use the cmd | getline
pattern. The following simple example will read the current date into mydate:
$ awk 'BEGIN { cmd = "date"; cmd | getline mydate; close(cmd); print mydate }'
Thu Mar 1 16:26:15 CET 2018
We use the BEGIN
keyword as we have no input to this simple example.
So let us expand this:
awk 'BEGIN { FS=","; OFS=FS };
{
cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";
cmd | getline firstdate;
close(cmd);
cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";
cmd | getline seconddate;
close(cmd);
print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate
}' inputfile.csv
And we can collapse it to a one-liner:
awk 'BEGIN {FS=",";OFS=FS};{cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";cmd | getline firstdate;close(cmd);cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";cmd | getline seconddate;close(cmd);print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate}' inputfile.csv
Which gives me the output:
1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54
Addendum
As the purpose here is to learn good habit I better update this answer. It is a bad habit to repeat code. When you start doing that you should split things into a function. As you will notice the code below immediately becomes more readable.
awk 'function convertdate(the_date) {
cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""the_date"\"";
cmd | getline formatted_date;
close(cmd);
return formatted_date
}
BEGIN { FS=","; OFS=FS };
{
print $1,$2,$3,$4,$5,$6,$7,convertdate($8),convertdate($9)
}' inputfile.csv
Make a habit of this and you will notice how much easier it will become to introduce error handling later on.
Best Answer
You have the fields in the right order, but your first print statement adds a newline (Output Record Separator), so your data's there, but just wrapped unexpectedly.
The second issue is that you're telling printf to use a width of 4; that includes the decimal point and the two digits after it, leaving only one for the leading digit and none for any padding. Try using 5 as the width, so that your data is padded up to four total numbers. If you want 4 digits before the decimal point, then change the width to 7 instead.
This is the shortest change I made from your program to something that outputs what I think you want:
I combined multiple
{ }
blocks into one, and also combined the print statements into one.If I was to write your awk statement from scratch, I might do something like this:
It explicitly sets the input Field Separator, the Output Field Separator, explicitly converts each of the fields on its own, then prints the desired fields, with the OFS separating them.