This was a fun one.
First, we need to eliminate trailing comments, as in:
86.242.200.81 banana.domain.net # comment
We can do that with the following (assuming just spaces, no tabs):
sed 's/ *#.*//'
If you have tabs in your hosts file, maybe run this first:
tr '\t' ' '
Then we need to eliminate "comment out this line" comments, which I'm going to define as a single hash character preceding an ip address. We can remove those like this:
sed '/^#[0-9]/ s/^#//'
Putting the above together gets us:
### Comments
# Comments
86.242.200.81 banana.domain.net
86.242.200.3 orange.domain.net
31.28.225.81 monkey.anotherdomain.net
51.18.33.4 puffin.domainz.com
31.28.220.80 monkey.anotherdomain.net
86.242.201.3 orange.domain.net
If we sort this on the second column (sort -k2
), we get a list sorted by name:
86.242.200.81 banana.domain.net
# Comments
### Comments
31.28.220.80 monkey.anotherdomain.net
31.28.225.81 monkey.anotherdomain.net
86.242.200.3 orange.domain.net
86.242.201.3 orange.domain.net
51.18.33.4 puffin.domainz.com
And now we can apply uniq
to find duplicates, if we tell uniq
to ignore the first field:
uniq -c -f 1
Which gives us:
2
1 86.242.200.81 banana.domain.net
1 # Comments
1 ### Comments
2 31.28.220.80 monkey.anotherdomain.net
2 86.242.200.3 orange.domain.net
1 51.18.33.4 puffin.domainz.com
So if we look for lines with a count of 2 or higher, we have found our duplicates. Putting this all together we get:
#!/bin/sh
tr '\t' ' ' |
sed '
/^#[0-9]/ s/^#//
s/ *#.*//
/^ *$/ d
' |
sort -k2 |
uniq -f 1 -c |
awk '$1 > 1 {print}'
The final awk
statement in the above script looks for lines where the count from uniq
(field1 ) is > 1
.
Running the above script looks like this.
To replace commas with semicolons on the last n lines with ed
:
n=3
ed -s input <<< '$-'$((n-1))$',$s/,/;/g\nwq'
Splitting that apart:
ed -s
= run ed silently (don't report the bytes written at the end)
'$-'
= from the end of the file ($
) minus ...
$((n-1))
= n-1 lines ...
- (
$' ... '
= quote the rest of the command to protect it from the shell )
,$s/,/;/g
= ... until the end of the file (,$
), search and replace all commas with semicolons.
\nwq
= end the previous command, then save and quit
To replace commas with semicolons on the last n lines with sed
:
n=3
sed -i "$(( $(wc -l < input) - n + 1)),\$s/,/;/g" input
Breaking that apart:
-i
= edit the file "in-place"
$(( ... ))
= do some math:
$( wc -l < input)
= get the number of lines in the file
-n + 1
= go backwards n-1 lines
,\$
= from n-1 lines until the end of the file:
s/,/;/g
= replace the commas with semicolons.
Best Answer