Grep variable-length optional text

grepperl

I would like to search for edge-anchored text that has a block of preceding optional text, any amount of which can be included in the match. For instance, supposing I am trying to find ^xyz but would also accept ^wxyz, ^vwxyz, ^uvwxyz, ^tuvwxyz, ^stuvwxyz, ^rstuvwxyz, but no other possibilities (for my actual search, I couldn't simply use a regex character sequence like [r-w] because the actual block of preceding characters are not in alphabetical order as in this simplified example), I could use the command egrep '^r?s?t?u?v?w?xyz'. Is there another way to write this search such that I could simply apply the optional flag (?) to the entire sequence rather than to each element individually?

Edit:

Here is an example of more realistic data:
The full text to be matched is AZHDEOIMOSJDJKEJLCN. However, letters are variably lost from the left end, so all the following should be matched:

^AZHDEOIMOSJDJKEJLCN
^ZHDEOIMOSJDJKEJLCN
^HDEOIMOSJDJKEJLCN
^DEOIMOSJDJKEJLCN
^EOIMOSJDJKEJLCN
^OIMOSJDJKEJLCN
^IMOSJDJKEJLCN
^MOSJDJKEJLCN
^OSJDJKEJLCN
^SJDJKEJLCN
^JDJKEJLCN
^DJKEJLCN
^JKEJLCN
^KEJLCN

Thus, the residual KEJLCN is essential and everything preceding it is optional. However, I cannot simply grep for KEJLCN because I only want instances that are anchored to the beginning of the line (^) and are optionally preceded by the other characters listed above. Also, note that the search string will be in a variable and the minimal residue (e.g., KEJLCN) will be extracted by a substring operation in a script (for example, in a perl environment, running egrep as a system command searching for the text $query, the essential text would be contained in substr($query,-6) and the optional preceding text would be contained in substr($query,0,length($query)-6). Therefore, the solution should be valid for regex in variable form rather than for string literals only.

Best Answer

grep '[[:lower:]]*xyz'

Would return you all lines on which that pattern is matched. But, of course, this does not match explicit sequences of characters.

Still, that appears to be a problem you have already solved:

grep -f - <<\STRINGS /dev/fd/3 3<<\DATA
^ZHDEOIMOSJDJKEJLCN
^HDEOIMOSJDJKEJLCN
^DEOIMOSJDJKEJLCN
^EOIMOSJDJKEJLCN
^OIMOSJDJKEJLCN
^IMOSJDJKEJLCN
^MOSJDJKEJLCN
^OSJDJKEJLCN
^SJDJKEJLCN
^JDJKEJLCN
^DJKEJLCN
^JKEJLCN
^KEJLCN
STRINGS

SJDJKEJLCN                                                                  
JDJKEJLCN
o;aidsfjoasjif
KKEJnotLCN
DATA

OUTPUT

SJDJKEJLCN                                                                 
JDJKEJLCN 

If you would like to programmatically generate the same lookup table...

grep -f - <<STRINGS /dev/fd/3 3<<\DATA
$(
    MATCH=ZHDEOIMOSJDJKEJLCN
    until [ ${#MATCH} -lt ${MINLEN=6} ]
    do  printf '^%s\n' "$MATCH"
        MATCH=${MATCH#?}
    done
)
STRINGS

SJDJKEJLCN                                                                  
JDJKEJLCN                                                                  
o;aidsfjoasjif                                                             
KKEJnotLCN                                                                 
DATA

OUTPUT

SJDJKEJLCN                                                                  
JDJKEJLCN
Related Question