I have a csv containing the following data structure:
1111,2222,3333,4444,5555,6666,7777,2017-1-5 1:07:09,2017-1-5 1:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54
I want to display the dates month and day as always being 2 digits long. I also want the times Hour field to always be 2 digits.
Essentially adding leading zeros if the month/day/hour fields are only a single digit as in the example line above.
Using awk, how would I go about achieving the following result:
1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54
Best Answer
A great tool for text processing is awk. The following example is using plain standard awk on FreeBSD 11.1. @RomanPerekhrest has an elegant solution in another answer if you prefer GNU awk.
Your input is comma-separated. Because of this we invoke
awk
with the-F,
parameter.We can then print out columns using the
print
statement.$1
is the first column.$2
is the second column.This gives us the 8th column for each row.
This is then the date field you want to manipulate. Rather than setting the delimiter using the command-line parameter we can do it as part of the script. FS for the input delimiter and OFS for the output delimiter.
When working with dates I often prefer to use the
date
util to make sure I handle them correctly. And I do not need to worry if I am using regular or GNU awk. Furthermore I get a big fat failure if the date does not parse correctly.The interesting parameter are:
So if we run this for one date:
We can then combine this with awk. Notice how the quotes are escaped. This is probably the biggest stumbling block for a beginner.
The system call seems correct - but unfortunately it only allows us to capture the returncode and it prints directly to the output. To avoid this we use the
cmd | getline
pattern. The following simple example will read the current date into mydate:We use the
BEGIN
keyword as we have no input to this simple example.So let us expand this:
And we can collapse it to a one-liner:
Which gives me the output:
Addendum
As the purpose here is to learn good habit I better update this answer. It is a bad habit to repeat code. When you start doing that you should split things into a function. As you will notice the code below immediately becomes more readable.
Make a habit of this and you will notice how much easier it will become to introduce error handling later on.