shell-script shell interpreter – Parsing Scripts at Runtime: Ubiquitous to Shells or Other Interpreters?

interpretershellshell-script

I had always thought that shells parse whole scripts, constructing an AST, and then execute that AST from memory. However, I just read a comment by Stéphane Chazelas, and tested executing this script, edit-while-executing.sh:

#!/bin/bash

echo start
sleep 10

and then while it was sleeping running:

$ echo "echo end" >> edit-while-executing.sh

and it worked to cause it to print "end" at the end.

However, when trying to modify this:

#!/bin/bash

while true; do
  echo yes
done

by doing:

$ printf "%s" "no " | dd of=edit-while-executing.sh conv=notrunc seek=35 bs=1

It didn't work, and kept printing "yes".

I also wondered if other non-shell interpreters also worked like this, and tried the equivalent of the first script with python, but it didn't work. Though, maybe python is not an interpreter anymore and it's more of a JIT compiler.

So to reiterate my question, is this a behaviour ubiquitous to shells and limited to them or also present in other interpreters (those not regarded as shells)? Also how does this work such that could I do the first modification but not the second?

Best Answer

So, this runs indefinitely in Bash/dash/ksh/zsh (or at least until your disk fills up):

#!/bin/sh
s=$0
foo() { echo "hello"; echo "foo" >> $s; sleep .1; }
foo

The thing to note, is that only stuff ~~appended~~ added to the script file after the last line the shell has read matters. The shells don't go back to re-read the earlier parts, which they even couldn't do, if the input was a pipe.

The similar construct doesn't work in Perl, it reads the whole file in before running.

#!/usr/bin/perl -l    
open $fd, ">>", $0;
sub foo { print "hello"; print $fd 'foo;' }
foo;

We can see that it does so also when given input through a pipe. This gives a syntax error (and only that) after 1 second:

$ (echo 'printf "hello\n";' ; sleep 1 ; echo 'if' ) | perl

While the same script piped to e.g. Bash, prints hello, and then throws the syntax error one second later.

Python appears similar to Perl with piped input, even though the interpreter runs a read-eval-print loop when interactive.

In addition to reading the input script line-by-line, at least Bash and dash process arguments to eval one line at a time:

$ cat evaltest.sh
var='echo hello
fi'
eval "$var"
$ bash evaltest.sh
hello
evaltest.sh: eval: line 4: syntax error near unexpected token `fi'
evaltest.sh: eval: line 4: `fi'

Zsh and ksh give the error immediately.

Similarly for sourced scripts, this time Zsh also runs line-by-line, as do Bash and dash:

$ cat sourceme.sh
echo hello
fi
$ zsh -c '. ./sourceme.sh'
hello
./sourceme.sh:2: parse error near `fi'

Example

Say I have this shell script.

$ cat hello_ul.bash 
#!/bin/bash

echo "Hello Unix & Linux!"

Running it using strace:

$ strace -s 2000 -o strace.log ./hello_ul.bash
Hello Unix & Linux!
$

Taking a look inside the strace.log file reveals the following.

...
open("./hello_ul.bash", O_RDONLY)       = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff0b6e3330) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/bash\n\necho \"Hello Unix & Linux!\"\n", 80) = 40
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=4*1024}) = 0
fcntl(255, F_GETFD)                     = -1 EBADF (Bad file descriptor)
dup2(3, 255)                            = 255
close(3)     
...

Once the file's been read in, it's then executed:

...
read(255, "#!/bin/bash\n\necho \"Hello Unix & Linux!\"\n", 40) = 40
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0b38ba000
write(1, "Hello Unix & Linux!\n", 20)   = 20
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "", 40)                       = 0
exit_group(0)                           = ?

In the above we can clearly see that the entire script appears to be being read in as a single entity, and then executed there after. So it would "appear" at least in Bash's case that it reads the file in, and then executes it. So you'd think you could edit the script while it's running?

NOTE: Don't, though! Read on to understand why you shouldn't mess with a running script file.

What about other interpreters?

But your question is slightly off. It's not Linux that's necessarily loading the contents of the file, it's the interpreter that's loading the contents, so it's really up to how the interpreter's implemented whether it loads the file entirely or in blocks or lines at a time.

So why can't we edit the file?

If you use a much larger script however you'll notice that the above test is a bit misleading. In fact most interpreters load their files in blocks. This is pretty standard with many of the Unix tools where they load blocks of a file, process it, and then load another block. You can see this behavior with this U&L Q&A that I wrote up a while ago regarding grep, titled: How much text does grep/egrep consume each time?.

Example

Say we make the following shell script.

$ ( 
    echo '#!/bin/bash'; 
    for i in {1..100000}; do printf "%s\n" "echo \"$i\""; done 
  ) > ascript.bash;
$ chmod +x ascript.bash

Resulting in this file:

$ ll ascript.bash 
-rwxrwxr-x. 1 saml saml 1288907 Mar 23 18:59 ascript.bash

Which contains the following type of content:

$ head -3 ascript.bash ; echo "..."; tail -3 ascript.bash 
#!/bin/bash
echo "1"
echo "2"
...
echo "99998"
echo "99999"
echo "100000"

Now when you run this using the same technique above with strace:

$ strace -s 2000 -o strace_ascript.log ./ascript.bash
...    
read(255, "#!/bin/bash\necho \"1\"\necho \"2\"\necho \"3\"\necho \"4\"\necho \"5\"\necho \"6\"\necho \"7\"\necho \"8\"\necho \"9\"\necho \"10\"\necho 
...
...
\"181\"\necho \"182\"\necho \"183\"\necho \"184\"\necho \"185\"\necho \"186\"\necho \"187\"\necho \"188\"\necho \"189\"\necho \"190\"\necho \""..., 8192) = 8192

You'll notice that the file is being read in at 8KB increments, so Bash and other shells will likely not load a file in its entirety, rather they read them in in blocks.

References

The #! magic, details about the shebang/hash-bang mechanism on various Unix flavours

Bash – Is “Arithmetic Expansion” the expected action on vars inside `[[` tests

From man ksh:

An arithmetic expression uses the same syntax, precedence, and associativity of expression as the C language. All the C language operators that apply to floating point quantities can be used... Variables can be referenced by name within an arithmetic expression without using the parameter expansion syntax. When a variable is referenced, its value is evaluated as an arithmetic expression...

A conditional expression is used with the [[ compound command to test attributes of files and to compare strings. Field splitting and file name generation are not performed on the words between [[ and ]]. Each expression can be constructed from one or more of the following unary or binary expressions...

The following obsolete arithmetic comparisons are also permitted:

exp1-eqexp2

True, if exp1 is equal to exp2.

exp1-neexp2

True, if exp1 is not equal to exp2.

exp1-ltexp2

True, if exp1 is less than exp2.

exp1-gtexp2

True, if exp1 is greater than exp2.

exp1-leexp2

True, if exp1 is less than or equal to exp2.

exp1-geexp2

True, if exp1 is greater than or equal to exp2.

The documentation there is consistent where references to arithmetic expressions are concerned, and (apparently carefully) avoids any self-contradictions surrounding the definition of the [[ compound command ]] pertaining to string comparison by explicitly also permitting some obsolete arithmetic comparisons in the same context.

From man bash:

[[expression]]

Return a status of 0 or 1 depending on the evaluation of the condi‐ tional expression expression. Expressions are composed of the primaries described below... Word splitting and pathname expansion are not performed on the words between the [[ and ]]; ~tilde expansion, ${parameter} and $variable expansion, $((arithmetic expansion)), $(command substitution), <(process substitution), and "\'quote removal are performed. Conditional operators such as -f must be unquoted to be recognized as primaries...

A variable may be assigned to by a statement of the form:

name=[value]

If value is not given, the variable is assigned the null string. All values undergo ~tilde expansion, ${parameter} and $variable expansion, $(command substitution), $((arithmetic expansion)), and "\'quote removal... If the variable has its integer attribute set, then value is evaluated as an $((arithmetic expression)) even if the $((...)) expansion is not used...

The shell allows arithmetic expressions to be evaluated, under certain circumstances... Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language.

Shell variables are allowed as operands; parameter expansion is performed before the expression is evaluated. Within an expression, shell variables may also be referenced by name without using the parameter expansion syntax. The value of a variable is evaluated as an arithmetic expression when it is referenced, or when a variable which has been given the integer attribute using declare -i is assigned a value... A shell variable need not have its integer attribute turned on to be used in an expression.

Conditional expressions are used by the [[ compound command and the test and [ builtin commands to test file attributes and perform string and arithmetic comparisons...

arg1OParg2

OP is one of -eq, -ne, -lt, -le, -gt, or -ge. These arithmetic binary operators return true if arg1 is equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to arg2, respectively. arg1 and arg2 may be positive or negative integers.

I think, given all that context, the behavior you observe stands to reason, even if it is not explicitly spelled out as a possibility in the documentation there. The docs do point to special treatment of parameters with integer attributes, and clearly denote a difference between a compound command and a builtin command.

The [[ comparison is syntax in the same sense that the assignment name=value is syntax or casewordin... is syntax. test and [, however, are not as such, and are rather separate procedures which take arguments. As I think, the best way to really get a feel for the differences is to have a look at shell error output:

set   '[[ \\ -eq 0 ]]' '[ \\ -eq 0 ]'
for    sh in   bash ksh
do     for     exp
       do     "$sh" -c  "$1||$2"
               set "$2" "$1"
done;  done

bash: [[: \: syntax error: operand expected (error token is "\")
bash: line 0: [: \: integer expression expected
bash: line 0: [: \: integer expression expected
bash: [[: \: syntax error: operand expected (error token is "\")
ksh: \: arithmetic syntax error
ksh: [: \: arithmetic syntax error
ksh: \: arithmetic syntax error

The two shells handle the exceptions differently, but the underlying reasons for the differences in both cases for both shells are very similar.

bash directly calls the [[ \\ case a syntax error - in the same way it might for a redirect from a non-existent file, for example - though it goes on from that point (as I believe, incorrectly) to evaluate the other side of the || or expression. bash does give the [[ expression a command name in error output, but note that it doesn't bother discussing the line number on which you call it as it does for the [ command. bash's [ complains about not receiving what it expects to be an integer expression as an argument, but [[ need not complain in that way because it doesn't really take arguments, and never needs to expect anything at all when it is parsed alongside the expansions themselves.
ksh halts altogether when the [[ syntax error and doesn't bother with [ at all. It writes the same error message for both, but note that [ is assigned a command name there where [[ is just ksh. The [ is only called after the command-line has been successfully parsed and expansions have already occurred - it will do its own little getopts routine and get its own arg[0c] and the rest, but [[ is handled as underlying shell syntax once again.

I consider the bash docs slightly less clear than the ksh version in that they use the terms arg[12] rather than expression regarding integer comparisons, but I think it is done merely because [[, [, and test are all lumped together at that juncture, and the latter two do take arguments whereas the former only ever receives an expression.

In any case, where the integer comparison is not ambiguous in the syntax context, you can basically do any valid math operation mid-expression:

   m=5+5  a[m]=10
[[     m   -eq 10 ]] &&
[[     m++ -eq 10 ]] &&
[[     m-- -gt 10 ]] &&
[[ ${a[m]}  == 10 ]] &&
echo "math evals"

math evals

Best Answer

Related Solutions

Linux – How Does It Deal with Shell Scripts?

Example

What about other interpreters?

So why can't we edit the file?

Example

References

Bash – Is “Arithmetic Expansion” the expected action on vars inside `[[` tests

Related Question