Excel – How to find specific text syntax format in a cell with an Excel function

microsoft excelworksheet-function

I am looking for a cell formula to find/identify/validate if an adjacent cell contains characters in the correct format.

If there is a character "|", then it must be preceded or followed by any amount of text and then another "|" before the text ends or there is a space, otherwise the cell is wrong.

Examples:

|Name| |Surname| |City| = ok
|Name| |Surname| = ok
|Name| |Surname| New York = ok
New York |Name| |Surname| = ok
|Name| |City |Surname| = wrong
N|ame |City| |Surname| = wrong
|Surname| |ZipCode| = ok

Any ideas?

Best Answer

Another way of stating the problem is:

  1. || (consecutive |) are not allowed, nor those enclosing only two of more spaces*.
  2. If there exists a | | (one space in between) in the text to be validated, it must immediately be preceded by any amount of non-| text, with a | or another | | immediately prior to that, and it must immediately be followed by any amount of non-| text followed by a | or another | |.
  3. If there are no | | then there must either be no | or exactly two |.

Condition 1. is, technically, explicitly ruled out in the question, ("any amount of text" can mean none or space-only is allowed) but it can be inferred from the examples that this is the intent of the OP.

With the conditions re-worded as above a formula-only solution becomes readily apparent as seen applied in the following worksheet:

Screenshot of worksheet

This is the formula entered into B2:B11:

=IF(CHOOSE(MIN(3,1+LEN(A1)-LEN(SUBSTITUTE(A1,"|",""))),TRUE,FALSE,AND(LEN(A1)-LEN(SUBSTITUTE(A1,"|",""))-(LEN(A1)-LEN(SUBSTITUTE(A1,"| |","")))/3*2=2,LEN(TRIM(MID(A1,FIND("|",A1)+1,FIND("|",A1,FIND("|",A1)+1)-FIND("|",A1)-1)))>0)),"ok","wrong")

Explanation:

The prettified version of the formula is as follows:

=
IF(
  CHOOSE(
    MIN(3,1+LEN(A1)-LEN(SUBSTITUTE(A1,"|",""))),
    TRUE,
    FALSE,
    AND(
      LEN(A1)-LEN(SUBSTITUTE(A1,"|",""))-(LEN(A1)-LEN(SUBSTITUTE(A1,"| |","")))/3*2=2,
      LEN(TRIM(MID(A1,FIND("|",A1)+1,FIND("|",A1,FIND("|",A1)+1)-FIND("|",A1)-1)))>0
    )
  ),
  "ok",
  "wrong"
)

The three conditions above can be refactored to the following:

[a] There must be precisely 2 more | than those accounted for by the | |s (the first and the last ones).

and

[b] If there exist any |, there must be at least two of them, and the first two of them must be separated by at least one non-space character.

The formula for [a] is:

LEN(A1)-LEN(SUBSTITUTE(A1,"|",""))-(LEN(A1)-LEN(SUBSTITUTE(A1,"| |","")))/3*2=2

The formula for the intra-| text validation part of [b] is:

LEN(TRIM(MID(A1,FIND("|",A1)+1,FIND("|",A1,FIND("|",A1)+1)-FIND("|",A1)-1)))>0

The other part of [b] (i.e., that there can't only be one |) is taken care of by the CHOOSE() function, which also takes care of the case when there are no | (required since this edge case causes errors in formula [b] and an incorrect result for formula [a]).

The first argument of the CHOOSE() function,

MIN(3,1+LEN(A1)-LEN(SUBSTITUTE(A1,"|","")))

maps the possible counts of |s to the indexes 1, 2, and 3 like so: [0,1,2,3,4,…][1,2,3,3,3,…], and thus the function returns TRUE for a count of 0, FALSE for a count of 1, and the result of the AND() function for all other counts.


* The condition not allowing two or more intra-| spaces can be relaxed by the use of the TRIM() function.

Related Question