Shell – How to remove dot character from string without calling sed or awk again

awkregular expressionsedshell-scriptstring

I have a file called hostlist.txt that contains text like this:

host1.mydomain.com
host2.mydomain.com
anotherhost
www.mydomain.com
login.mydomain.com
somehost
host3.mydomain.com

I have the following small script:

#!/usr/local/bin/bash

while read host; do
        dig +search @ns1.mydomain.com $host ALL \
        | sed -n '/;; ANSWER SECTION:/{n;p;}';
done <hostlist.txt \
        | gawk '{print $1","$NF}' >fqdn-ip.csv

Which outputs to fqdn-ip.csv:

host1.mydomain.com.,10.0.0.1
host2.mydomain.com.,10.0.0.2
anotherhost.internal.mydomain.com.,10.0.0.11
www.mydomain.com.,10.0.0.10
login.mydomain.com.,10.0.0.12
somehost.internal.mydomain.com.,10.0.0.13
host3.mydomain.com.,10.0.0.3

My question is how do I remove the . just before the comma without invoking sed or gawk again? Is there a step I can perform in the existing sed or gawk calls that will strip the dot?

hostlist.txt will contain 1000s of hosts so I want my script to be fast and efficient.

Best Answer

The sed command, the awk command, and the removal of the trailing period can all be combined into a single awk command:

while read -r host; do dig +search "$host" ALL; done <hostlist.txt | awk 'f{sub(/.$/,"",$1); print $1", "$NF; f=0} /ANSWER SECTION/{f=1}'

Or, as spread out over multiple lines:

while read -r host
do
    dig +search "$host" ALL
done <hostlist.txt | awk 'f{sub(/.$/,"",$1); print $1", "$NF; f=0} /ANSWER SECTION/{f=1}'

Because the awk command follows the done statement, only one awk process is invoked. Although efficiency may not matter here, this is more efficient than creating a new sed or awk process with each loop.

Example

With this test file:

$ cat hostlist.txt 
www.google.com
fd-fp3.wg1.b.yahoo.com

The command produces:

$ while read -r host; do dig +search "$host" ALL; done <hostlist.txt | awk 'f{sub(/.$/,"",$1); print $1", "$NF; f=0} /ANSWER SECTION/{f=1}'
www.google.com, 216.58.193.196
fd-fp3.wg1.b.yahoo.com, 206.190.36.45

How it works

awk implicitly reads its input one record (line) at a time. This awk script uses a single variable, f, which signals whether the previous line was an answer section header or not.

  • f{sub(/.$/,"",$1); print $1", "$NF; f=0}

    If the previous line was an answer section header, then f will be true and the commands in curly braces are executed. The first removes the trailing period from the first field. The second prints the first field, followed by ,, followed by the last field. The third statement resets f to zero (false).

    In other words, f here functions as a logical condition. The commands in curly braces are executed if f is nonzero (which, in awk, means 'true').

  • /ANSWER SECTION/{f=1}

    If the current line contains the string ANSWER SECTION, then the variable f is set to 1 (true).

    Here, /ANSWER SECTION/ serves as a logical condition. It evaluates to true if the current matches the regular expression ANSWER SECTION. If it does, then the command in curly braces in executed.

Related Question