Shell – Is there something like JavaScript’s “split()” in the shell

shellshell-scriptstring

It's very easy to use split() in JavaScript to break a string into an array.

What about shell script?

Say I want to do this:

$ script.sh var1_var2_var3

When the user give such string var1_var2_var3 to the script.sh, inside the script it will convert the string into an array like

array=( var1 var2 var3 )
for name in ${array[@]}; do
    # some code
done

Best Answer

Bourne/POSIX-like shells have a split+glob operator and it's invoked every time you leave a parameter expansion ($var, $-...), command substitution ($(...)), or arithmetic expansion ($((...))) unquoted in list context.

Actually, you invoked it by mistake when you did for name in ${array[@]} instead of for name in "${array[@]}". (Actually, you should beware that invoking that operator like that by mistake is source of many bugs and security vulnerabilities).

That operator is configured with the $IFS special parameter (to tell what characters to split on (though beware that space, tab and newline receive a special treatment there)) and the -f option to disable (set -f) or enable (set +f) the glob part.

Also note that while the S in $IFS was originally (in the Bourne shell where $IFS comes from) for Separator, in POSIX shells, the characters in $IFS should rather be seen as delimiters or terminators (see below for an example).

So to split on _:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
array=($string) # invoke the split+glob operator

for i in "${array[@]}"; do # loop over the array elements.

To see the distinction between separator and delimiter, try on:

string='var1_var2_'

That will split it into var1 and var2 only (no extra empty element).

So, to make it similar to JavaScript's split(), you'd need an extra step:

string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
temp=${string}_ # add an extra delimiter
array=($temp) # invoke the split+glob operator

(note that it would split an empty $string into 1 (not 0) element, like JavaScript's split()).

To see the special treatments tab, space and newline receive, compare:

IFS=' '; string=' var1  var2  '

(where you get var1 and var2) with

IFS='_'; string='_var1__var2__'

where you get: '', var1, '', var2, ''.

Note that the zsh shell doesn't invoke that split+glob operator implicitly like that unless in sh or ksh emulation. There, you have to invoke it explicitely. $=string for the split part, $~string for the glob part ($=~string for both), and it also has a split operator where you can specify the separator:

array=(${(s:_:)string})

or to preserve the empty elements:

array=("${(@s:_:)string}")

Note that there s is for splitting, not delimiting (also with $IFS, a known POSIX non-conformance of zsh). It's different from JavaScript's split() in that an empty string is split into 0 (not 1) element.

A notable difference with $IFS-splitting is that ${(s:abc:)string} splits on the abc string, while with IFS=abc, that would split on a, b or c.

With zsh and ksh93, the special treatment that space, tab or newline receive can be removed by doubling them in $IFS.

As a historic note, the Bourne shell (the ancestor or modern POSIX shells) always stripped the empty elements. It also had a number of bugs related to splitting and expansion of $@ with non-default values of $IFS. For instance IFS=_; set -f; set -- $@ would not be equivalent to IFS=_; set -f; set -- $1 $2 $3....

Splitting on regexps

Now for something closer to JavaScript's split() that can split on regular expressions, you'd need to rely on external utilities.

In the POSIX tool-chest,awk has a split operator that can split on extended regular expressions (those are more or less a subset of the Perl-like regular expressions supported by JavaScript).

split() {
  awk -v q="'" '
    function quote(s) {
      gsub(q, q "\\" q q, s)
      return q s q
    }
    BEGIN {
      n = split(ARGV[1], a, ARGV[2])
      for (i = 1; i <= n; i++) printf " %s", quote(a[i])
      exit
    }' "$@"
}
string=a__b_+c
eval "array=($(split "$string" '[_+]+'))"

The zsh shell has builtin support for Perl-compatible regular expressions (in its zsh/pcre module), but using it to split a string, though possible is relatively cumbersome.

Related Question