Shell – Command line processing – tokens and metacharacters

shell

I am just learning about command line processing, and hoping someone can confirm how I am interpreting the following statement. In the book I am reading the first step in command line processing is:

  1. Splits the command into tokens that are separated by the fixed set of metacharacters: SPACE, TAB, NEWLINE, ;, (, ), <, >, |, and &. Types of tokens include words, keywords, I/O redirectors, and semicolons

Am I right in thinking that for the command:

ls | more

ls and more are the tokens, and the pipe character is the meta character separating the two tokens?

I got bit confused as it goes on to say that < and > are meta characters, but then says that tokens can be I/O directors.

Best Answer

This is not a very good explanation. A token is a sequence of characters that forms a word or punctuation sign. Characters like < and | are part of tokens too. You may call them metacharacters but this is not useful terminology. The basic rules are:

  • Whitespace is not part of a token and separates tokens.
  • A token is made up of ordinary characters, or of operator characters ()<>&|;, but not both. For example, foo<@a&>b consists of the tokens foo (ordinary), < (operator), @a (ordinary), &> (operator) and b.

Then there are additional rules about quoting: special characters lose their meaning if they're quotes, with different rules depending on the type of quote. For example, foo'&&'bar\|qux is a single token with the character sequence foo&&bar|qux.