Implementing an extended regexp to add a variable number of leading zeros based on position in a string

regular expressionsed

I am having trouble getting my sed syntax down to add a varying number of leading zeros to a numeric organizational scheme. The strings I am operating on appear like

1.1.1.1,Some Text Here

leveraging the sed syntax

sed -r ":r;s/\b[0-9]{1,$((1))}\b/0&/g;tr"

I am able to elicit the response

01.01.01.01,Some Text Here

However, What I am looking for is something to zero-fill up to 2 digits in fields 2 and 3 and 3 digits in field 4 so that all items are of a standard length at [0-9].[0-9]{2}.[0-9]{2}.[0-9]{3}

1.01.01.001,Some Text Here

For the life of me I cannot figure even how to modify the boundary to include the parameters necessary to snap to only numerals following a period. I think it has something to do with the use of the \b which I understand matches zero characters at a word boundary, but I do not understand why my attempts to add a period to the match fail as follows:

sed -r ":r;s/\.\b[0-9]{1,$((1))}\b/0&/g;tr"
sed -r ":r;s/\b\.[0-9]{1,$((1))}\b/0&/g;tr"
Both cause the statement to hang

sed -r ":r;s/\b[0-9]\.{1,$((1))}\b/0&/g;tr"
sed -r ":r;s/\b[0-9]{1,$((1))}\.\b/0&/g;tr"
sed -r ":r;s/\b[0-9]{1,$((1))}\b\./0&/g;tr"
cause the statement to output:

1.01.01.1,Some Text Here

Additionally, I expect that I will have additional problems if the statement contains text like:

1.1.1.1,Some Number 1 Here

It is a foregone conclusion that I need to really learn sed and all of its complexities. I am working on that, but expect that this particular statement will continue to cause me trouble for a while. Any help would be greatly appreciated.

EDIT: I've figured out a way… This statement seems to do what I am looking for, but there has got to be a more elegant way to do this.

sed -r ':r;s/\b[0-9]{1,1}\.\b/0&/;tr;:i;s/\b[0-9]{1,2},\b/0&/;ti;s/.//'

Also, syntactically this will cause problems if a similar number format appears in the text… similar to:

1.1.1.1,Some Text Referring to Document XXX Heading 1.2.3

In which case it will result in:

1.01.01.001,Some Text Referring to Document XXX Heading 01.02.03

Solved
Thank you all for your help here. I initially solved the problem with the answer I accepted below. I've sense moved the solution into Python as a part of a larger solution leveraging the sort below:

def getPaddedKey(line):
    keyparts = line[0].split(".")
    keyparts = map(lambda x: x.rjust(5, '0'), keyparts)
    return '.'.join(keyparts)

s=sorted(reader, key=getPaddedKey)

Best Answer

Usage: leading_zero.sh input.txt

#!/bin/bash

sed -r '
    s/\.([0-9]{1,2})\.([0-9]{1,2})\.([0-9]{1,3},)/.0\1.0\2.00\3/
    s/\.0*([0-9]{2})\.0*([0-9]{2})\.0*([0-9]{3})/.\1.\2.\3/
' "$1"

Explanation:

  1. First subtitution add certain amount of zeros to each number. 1 zero to 2 and 3 numbers, 2 zero to 4 number. Doesn't matter, how much digits already there are.
  2. Second substution removes all extra zeros, leaving only needed amount of numbers. 2 and 3 numbers should be contain only 2 digits. Leaves them and removes rests. Fourth number should be contain only 3 digits. Leaves them and removes rests.

input.txt

1.1.1.1,Some Text Here
1.1.1.1,Some Text Here
1.11.1.11,Some Text Referring to Document XXX Heading 1.2.3
1.1.1.1,Some Text Here
1.1.11.111,Some Text Referring to Document XXX Heading 1.2.3
1.11.1.1,Some Text Here

output.txt

1.01.01.001,Some Text Here
1.01.01.001,Some Text Here
1.11.01.011,Some Text Referring to Document XXX Heading 1.2.3
1.01.01.001,Some Text Here
1.01.11.111,Some Text Referring to Document XXX Heading 1.2.3
1.11.01.001,Some Text Here
Related Question