Word Splitting in Shell – What is Word Splitting and Its Importance in Shell Programming

shellzsh

I'm getting confused about the role word splitting plays in zsh. I have not been exposed to this concept when programming in C, Python or MATLAB, and this has triggered my interest of why word splitting seems to be something specific to shell programming.

I have read about word splitting on this and other sites before, but haven't found a clear explanation of the concept. Wikipedia has a definition of word splitting but does not seem to have references on how it applies to Unix shells.

Here's an example of my confusion in zsh:

In the Z Shell FAQ, I read the following:

3.1: Why does $var where var="foo bar" not do what I expect?

In most Bourne-shell derivatives, multiple-word variables such as
var="foo bar" are split into words when passed to a command or used in a for foo in $var loop. By default, zsh does not have that
behaviour: the variable remains intact. (This is not a bug! See
below.) The option SH_WORD_SPLIT exists to provide compatibility.

However, in the Z Shell Manual, I read the following:

SH_WORD_SPLIT (-y) <K> <S>

Causes field splitting to be performed on
unquoted parameter expansions. Note that this option has nothing to do
with word splitting. (See Parameter Expansion.)

Why does it say that SH_WORD_SPLIT has nothing to do with word splitting? Isn't word splitting precisely what this is all about?

Best Answer

Early shells had only a single data type: strings. But it is common to manipulate lists of strings, typically when passing multiple file names as arguments to a program. Another common use case for splitting is when a command outputs a list of results: the command's output is a string, but the desired data is a list of strings. To store a list of file names in a variable, you would put spaces between them. Then a shell script like this

files="foo bar qux"
myprogram $files

called myprogram with three arguments, as the shell split the string $files into words. At the time, spaces in file names were either forbidden or widely considered Not Done.

The Korn shell introduced arrays: you could store a list of strings in a variable. The Korn shell remained compatible with the then-established Bourne shell, so bare variable expansions kept undergoing word splitting, and using arrays required some syntactic overhead. You would write the snippet above

files=(foo bar qux)
myprogram "${files[@]}"

Zsh had arrays from the start, and its author opted for a saner language design at the expense of backward compatibility. In zsh (under the default expansion rules) $var does not perfom word splitting; if you want to store a list of words in a variable, you are meant to use an array; and if you really want word splitting, you can write $=var.

files=(foo bar qux)
myprogram $files

These days, spaces in file names are something you need to cope with, both because many users expect them to work and because many scripts are executed in security-sensitive contexts where an attacker may be in control of file names. So automatic word splitting is often a nuisance; hence my general advice to always use double quotes, i.e. write "$foo", unless you understand why you need word splitting in a particular use case. (Note that bare variable expansions undergo globbing as well.)