Modify Specific Column with Sed or Awk – Text Processing Guide

awksedtext processing

I have a CSV file that looks like this:

qw12er,foo,0 bn5mgh
rt8yp,foo,10 gh78jk
bn852mv,foo,852 78ghjkh
tgbr,foo,10 ujmyhn
wsx2d,foo,0000 ui52ohn
tgbr,foo,7418529 ujmyhn
ikl896o,foo,22 wsxdc52

I want to modify the 3rd column and remove all numbers and space from the beginning of the 3rd column.

Then output would be as follows:

qw12er,foo,bn5mgh
rt8yp,foo,gh78jk
bn852mv,foo,78ghjkh
tgbr,foo,ujmyhn
wsx2d,foo,ui52ohn
tgbr,foo,ujmyhn
ikl896o,foo,wsxdc52

Best Answer

Another solution with awk using sub:

awk -F, 'sub("^[0-9]+\\s","",$3)' OFS=, file

Output:

qw12er,foo,bn5mgh
rt8yp,foo,gh78jk
bn852mv,foo,78ghjkh
tgbr,foo,ujmyhn
wsx2d,foo,ui52ohn
tgbr,foo,ujmyhn
ikl896o,foo,wsxdc52

Explanation:

-F,: set the comma as input field separator
OFS=,: set the comma as output filed separator (a space by default)
sub("^[0-9]+\\s","",$3): erase numbers followed by a space at the beginning of the string $3 and print the current line (because "print" is the default action in awk)

In this way you can edit the desired column and print all the others (that in general may be many).

With awk:

awk -F'.' '{print $1}' file

-F option change default field separator(space) to dot(.).
$1 is index of field position(with . field separator).

{ILMN_1343291    TGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTCGCTTTGCTGTTCG  NM_001402}.{5}
                  ^^ field index is $1                                          ^^$2

With rev and awk:

rev file | awk -F'.' '{print $2}'|rev # reverse characters of each lines,\
                                        print field number 2 with (.) separator \
                                        and reverse the result again

The rev utility copies the specified files to standard output, reversing the order of characters in every line. If no files are specified, standard input is read.

With sed:

sed 's/.[0-9]*$//' file

sed 's/.[^.]*$//' file

$ point to end of line. In first sed command search for char(.) which followed by zero or more occurrences of numbers and replace them with whitespace.

In second sed command remove everything that followed by (.) and also remove dot(.) itself.

With rev and sed:

rev file| sed 's/.*[.]//' |rev

Delete everything before dot(.) Also include and remove . itself.

With grep:

grep -oP '.*(?=\.[0-9])' file

    -o, --only-matching
          Print only the matched (non-empty) parts of a matching line,
          with each such part on a separate output line.
    -P, --perl-regexp
          Interpret PATTERN as a Perl compatible regular expression (PCRE)

(?=pattern): Positive Lookahead: The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign.

.*(?=\.[0-9]): (positive lookahead) matches everything(.*) followed by one dot(.) and any occurrences of numbers, without making the pattern(\.[0-9]) part of the match.

With rev and grep:

rev file |grep -oP '(?<=[0-9]\.).*' |rev

rev file |grep -oP '[0-9]\.\K.*' |rev

(?<=pattern): Positive Lookbehind. A pair of parentheses, with the opening parenthesis followed by a question mark, "less than" symbol, and an equals sign.

(?<=[0-9]\.).* (positive lookbehind) matches everything which followed by any occurrences of numbers and end with dot(.).

In second grep command, you can use the nifty \K in place of the lookbehind assertion.

With cut:

cut -f1 -d. file

cut -c 1-77 file # Print first 77 characters of each line.

cut - remove sections from each line of files

-d, --delimiter=DELIM
      use DELIM instead of TAB for field delimiter

-f, --fields=LIST
      select  only  these  fields;

-c, --characters=LIST
      select only these characters

With while loop:

while read line; do echo "${line::-2}";done <file

This will work if you have only number with length=1 at the end of each lines and they are fix length. above command remove last two character at the end of every lines in input file. alternative commands is ${line%??}.

How to replace specific row using sed or awk based with run command

awk -F, '{ if ( $4 == "" ){printf "%s,",$1 ; system(CMD) ;}else {print $0}}' test.csv

If the fourth filed is empty print the first field with the output of the command, else print the hole line.

But be aware of the output of the command how it will print