Bash – How to extract lines by words in specific position, not column

bashcolumnstext processingtext;

I have an input file like this:

                     v
ATOM     57  O   LYS A   7       2.254  25.484  18.942  1.00 14.46
ATOM     77  NH1AARG A   8       5.557  19.204  13.388  0.55 24.50
TER    1648      ILE C 206
HETATM 1668  O   HOH A1023      25.873  38.343   2.138  1.00 21.99
                     ^

Only lines contains A at the marked position are what I need. In most lines, A is a single character as a fifth column like the first line. However, sometimes it's on the fourth column like the second row, or in a string like the last one. Note that A as a single character can appear in positions other than 22, but I only care when it's here.

I need my output to have only lines with A, regardless it is in single or in string:

ATOM     57  O   LYS A   7       2.254  25.484  18.942  1.00 14.46
ATOM     77  NH1AARG A   8       5.557  19.204  13.388  0.55 24.50
HETATM 1668  O   HOH A1023      25.873  38.343   2.138  1.00 21.99

But sometimes I also want to extract only lines with single A, regardless its column:

ATOM     57  O   LYS A   7       2.254  25.484  18.942  1.00 14.46
ATOM     77  NH1AARG A   8       5.557  19.204  13.388  0.55 24.50

Best Answer

You can use

grep -E '^.{21}A' file

if you want to include cases like A1023, and

grep -E '^.{21}A\>' file

if you want only lines where A appears as an isolated character

NOTE: In the second example the notation \> will match any trailing empty strings.

excerpt from grep man page

The Backslash Character and Special Expressions

The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it's not at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]].

Related Question