You could split
the field and use substr
by:
split($9, a, ";")
print substr(a[1], 4)
Awk indexes start at 1
.
Another option could be to modify the input field separator (FS
).
FS
is space, " ", by default – which also has the special effect of ignoring
leading and trailing spaces.
Also, instead of using print $1, \t, ...
or the printf
variant one could
set OFS
to tab.
Examples:
Modifying FS:
awk -F" +|;|=" '
$3 == "gene" {
printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",
$1, $4, $5, $10, $6, $7);
}
' data.file
Using split:
awk '
$3 == "gene" {
split($9, a, ";")
printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",
$1, $4, $5, substr(a[1], 3), $6, $7);
}
' data.file
OFS and FS:
Output Field Separator (OFS
) as tab, and alternative FS
inside awk.
Also updated FS
to include tab:
awk '
BEGIN {
FS="[ \t]+|;|="
OFS="\t"
}
$3 == "gene" {
print $1, $4, $5, $10, $6, $7
}
' data.file
Also see The Open Group Variables and Special Variables, Examples.
Gawk manual – it usually is noted when things are a gawk extension to awk.
Sed can handle this quite easily. It's a single "substitute" command, prefixed with an address range. I've added extra spacing for better readability:
sed -e '/^\[ABC\]$/ , /^\[.*\]$/ s/^\(value1=\).*$/\1notbla/'
Without the extra spacing, it's:
sed -e '/^\[ABC\]$/,/^\[.*\]$/s/^\(value1=\).*$/\1notbla/'
You don't really need anchored regexes, but they may be safer in some cases of unusual inputs. A slightly shorter version with unanchored regexes is:
sed -e '/\[ABC\]/,/^\[/s/^\(value1=\).*$/\1notbla/'
Explanation:
You asked for each flag or option to be explained, and I've got the time, so here you go. I'm explaining the final (shortest) version out of the three Sed commands listed above.
The first part of the line is an address range: /startregex/,/stopregex/
The s
ubstitute command which follows the address range is only applied to lines from startregex
to stopregex
(inclusive).
In this case the start regex is /\[ABC\]/
. Square brackets are usually special characters within a regex, so we put a backslash before each to signify literal square bracket characters.
The stop regex is /^\[/
, which uses the special regex character ^
to signify the start of a line. This pattern will match any line that starts with a literal left square bracket ([
).
The s
ubstitute command is basically quite simple; the general format is s/findregex/replacetext/
. It can also have special flags placed after the final /
to modify its behavior, but I'm not using any such flags here.
The "find regex" is ^\(value1=\).*$
.
The caret (^
) matches the start of the line, as mentioned earlier, and the dollar sign ($
) matches the end of the line. So this whole pattern must match an entire line, not merely part of one.
The parentheses (()
), unlike square brackets, are non-special by default in regexes, so we put the backslashes before them to give them their special meaning. They allow parts of the matched text (the text matched by the "find regex") to be used in the replacement text. Specifically, the \1
in the replacement text means, "The text matched within the first set of parentheses in the regex." In this case, that is always just "value1=".
The final element in the "find regex" is .*
. The dot (.
) means "any single character," and the asterisk (*
) means "any number of times (zero or more)." So the dot star (.*
) matches the entire rest of the line, after the equals sign.
"notbla" in the replacement text is just static text, nothing special about it.
To really learn Sed properly, I highly recommend the Grymoire Sed tutorial, which is free online.
Best Answer
With
grep
:grep with
-P
(perl-regexp) parameter supports\K
, which use to ignoring the previously matched characters.With
awk
:in awk the variable
NF
represent the total number of fields in a current record/line which is point to the last field number too and so$NF
is its value accordingly.With
sed
:^.*=sum
replace all characters(.*
) between starting of line(^
) and last characters(sum=
) with whitespace char.Result:
With
cut
:if you want save same values into a same file and each separately, with
awk
you can do: