I have a file like this
ILMN_1343291 TGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTCGCTTTGCTGTTCG NM_001402.5
ILMN_1343295 CTTCAACAGCGACACCCACTCCTCCACCTTTGACGCTGGGGCTGGCATTG NM_002046.3
ILMN_1651209 TCACGGCGTACGCCCTCATGGGGAAAATCTCCCCGGTGACTTTCAGGTCC NM_182838.1
I want to remove the numeric extensions from the end in the 3rd column so that my output file looks like this
ILMN_1343291 TGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTCGCTTTGCTGTTCG NM_001402
ILMN_1343295 CTTCAACAGCGACACCCACTCCTCCACCTTTGACGCTGGGGCTGGCATTG NM_002046
ILMN_1651209 TCACGGCGTACGCCCTCATGGGGAAAATCTCCCCGGTGACTTTCAGGTCC NM_182838
How can I do it on command line preferably using awk
? I can do this in perl
but I am pretty sure there is a single command line to do it.
Best Answer
With awk:
-F
option change default field separator(space) to dot(.).$1
is index of field position(with . field separator).With rev and awk:
The
rev
utility copies the specified files to standard output, reversing the order of characters in every line. If no files are specified, standard input is read.With sed:
$
point to end of line. In first sed command search for char(.) which followed by zero or more occurrences of numbers and replace them with whitespace.In second sed command remove everything that followed by (.) and also remove dot(.) itself.
With rev and sed:
Delete everything before dot(.) Also include and remove . itself.
With grep:
(?=pattern)
: Positive Lookahead: The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign..*(?=\.[0-9])
: (positive lookahead) matches everything(.*
) followed by one dot(.) and any occurrences of numbers, without making the pattern(\.[0-9]
) part of the match.With rev and grep:
(?<=pattern)
: Positive Lookbehind. A pair of parentheses, with the opening parenthesis followed by a question mark, "less than" symbol, and an equals sign.(?<=[0-9]\.).*
(positive lookbehind) matches everything which followed by any occurrences of numbers and end with dot(.).In second grep command, you can use the nifty
\K
in place of the lookbehind assertion.With cut:
With while loop:
This will work if you have only number with length=1 at the end of each lines and they are fix length. above command remove last two character at the end of every lines in input file. alternative commands is
${line%??}
.