Extract numbers from string using shell

bashcommand line

Using shell script, how do I extract two sets of numbers from a string like "R14C11"? I'd like to get the result as an AppleScript list.

R14C11 -> {14, 11}
R5C9   -> {5, 9}

"R" and "C" will be constants but the number of digits in each set of numbers can vary.

Best Answer

There are a lot of ways to do this, one is to use sed

echo R5C9 | sed -E 's|R(.*)C(.*)|{\1, \2}|'

Or, if you want to ensure that only input with the correct format will be matched

echo R5C9 | sed -E 's|R([[:digit:]]+)C([[:digit:]]+)|{\1, \2}|'

Some explanations:

  • -E enables extended regular expressions, which among other things makes the matching of the pattern easier
  • s|SOURCE|TARGET| is the substitution command to transform SOURCE into TARGET
  • R([[:digit:]]+)C([[:digit:]]+) is the source pattern we are looking for: An R followed by at least one digit [[:digit:]]+ followed by C followed again by at least one digit
  • The target replaces the matched source, with \1 standing for the text matched within the first () in the source, \2 for the second

You can also just use bash itself

[[ "R5C9" =~ R([0-9]+)C([0-9]+) ]] && echo "{${BASH_REMATCH[1]}, ${BASH_REMATCH[2]}}"
  • [[ "R5C9" =~ R([0-9]+)C([0-9]+) ]] matches the text, with R([0-9]+)C([0-9]+) being basically the same as the source pattern above
  • Matching parts within () get assigned to the shell array BASH_REMATCH
  • The echo is only executed if the match ([[ ... ]]) was successfull, and then prints the reformatted expression (which is a bit confusing to read because the various {} mean different things...)