Replace a character except last x occurrences

regular expressionsedtext processing

I have a file that has a bunch of hostnames correlated with IPs that looks like this:

x-cluster-front-1 192.168.1.2
x-cluster-front-2 192.158.1.10
y-cluster-back-1 10.1.11.99
y-cluster-back-2 10.1.157.38
int.test.example.com 59.2.86.3
super.awesome.machine 123.234.15.6

I want it to look like this:

x-cluster-front-1 192.168.1.2
x-cluster-front-2 192.158.1.10
y-cluster-back-1 10.1.11.99
y-cluster-back-2 10.1.157.38
int-test-example-com 59.2.86.3
super-awesome-machine 123.234.15.6

How can I replace the . (dots) from the first column with – (hyphen) in order to facilitate a sort by the second column? I was thinking of using sed to replace dots until the first space, or replacing every dot but the last three, but I'm having trouble understanding regex and sed. I can perform simple replaces but this is way over my head!

This is part of a larger script that I have been writing in bash. I am stuck at this part.

Best Answer

You can use AWK

awk '{gsub(/-/,".",$1);print}' infile

Explanation

awk splits a line on whitespace by default. Thus, the first column of the line ($1 in awk-ese) will be the one you want to perform the substitutions on. For this purpose, you can use:

 gsub(regex,replacement,string)

to perform the required substitution.

Note that gsub is supported only for gawk and nawk but on many modern distros awk is a softlink to gawk.

Related Solutions

Sed Command – Replace N First Occurrences of a Character

reading your question I've remembered that at least GNU Sed (probably not the one you have in Solaris) has the opposite feature that you want:

g: Apply the replacement to all matches to the regexp, not just the first.

number: Only replace the numberth match of the regexp.
Note: the posix standard does not specify what should happen when
you mix the g and number modifiers, and currently there is no widely agreed upon meaning across sed implementations. For GNU sed, the interaction is defined to be: ignore matches before the numberth, and then match and replace all matches from the numberth on.

So instead of:

hmontoliu@ulises:/tmp/wb$ echo one two three four five six seven | sed 's/ /;/g5' 
one two three four five;six;seven

you can get a more terse command to achieve what you want by doing:

hmontoliu@ulises:/tmp/wb$ echo one two three four five six seven | sed -e 's/ /;/g' -e 's/;/ /6g'
one;two;three;four;five;six seven

Tell us if the Solaris implementation has that feautre.

HTH

Replace every occurence of a character except the last one in every line

sed ':a;/[|].*[|]/s/[|]/ /;ta' file

/[|].*[|]/: If line has two pipes,
s/[|]/ /: Substitute the first with a space.
ta: If a substitution was made, go back to :a.

Output:

$ sed ':a;/[|].*[|]/s/[|]/ /;ta' file
FLD1      SFK TK  FLD2    FLD4  FLD5  -           20200515  NNNN |406   RCO 301
FLD1      SFK TK  FLD2    FLD4  FLD5  -           20200515  NNNN |0
FLD1      SFK TK  FLD2    FLD4  FLD5  -           20200515  NNNN |0

As @steeldriver has remarked, you can use simply | instead of [|] in a basic regular expression (BRE), as is the case above. If you add the -E flag to sed, extended regular expression (ERE) is enabled and then you need to write [|] or \|.

Just for completeness, POSIX sed specification says that "Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be followed by a semicolon". Then, a compliant alternative to the above is:

sed -e ':a' -e '/[|].*[|]/s/[|]/ /;t a' file

Best Answer

Related Solutions

Sed Command – Replace N First Occurrences of a Character

Replace every occurence of a character except the last one in every line

Related Question