The general story: separate namespaces
Generally shells distinguish between variables and functions because they're used in different contexts. In a nutshell, a name is a variable name if it appears after a $
, or as an argument to builtins such as export
(without -f
) and unset
(without -f
). A name is a function name if it appears as a command (after alias expansion) or as an argument to export -f
, unset -f
, etc.
Variables can be exported to the environment. The name of the environment variable is the same as the shell variable (and the values are the same too).
With older bash: confusion due to function export
Bash, unlike most other shells, can also export functions to the environment. Since there's no type indication in the environment, there's no way to recognize whether an entry in the environment is a function or not, other than by analyzing the name or the value of the environment variable.
Older versions of bash stored a function in the environment using the function's name as the name, and something that looks like the function definition as the function's value. For example:
bash-4.1$ foobar () { echo foobar; }
bash-4.1$ export -f foobar
bash-4.1$ env |grep -A1 foobar
foobar=() { echo foobar
}
bash-4.1$
Note that there's no way to distinguish a function whose code is { echo foobar; }
from a variable whose value is () { echo foobar}
(where 
is a newline character). This turned out to be a bad design decision.
Sometimes shell scripts get invoked with environment variables whose value is under control of a potentially hostile entity. CGI scripts, for example. Bash's function export/import feature allowed injecting functions that way. For example executing the script
#!/bin/bash
ls
from a remote request is safe as long as the environment doesn't contain variables with a certain name (such as PATH
). But if the request can set the environment variable ls
to () { cat /etc/passwd; }
then bash would happily execute cat /etc/passwd
since that's the body of the ls
function.
With newer bash: confusion mostly alleviated
This security vulnerability was discovered by Stéphane Chazelas as one of the aspects of the Shellshock bug. In post-Shellshock versions of bash, exported functions are identified by their name rather than by their content.
bash-4.3$ foobar () { echo foobar; }
bash-4.3$ export -f foobar
bash-4.3$ env |grep -A1 foobar
BASH_FUNC_foobar%%=() { echo foobar
}
There is no security issue now because names like BASH_FUNC_foobar%%
are not commonly used as command names, and can be filtered out by interfaces that allow passing environment variables. It's technically possible to have a %
character in the name of an environment variable (that's what makes modern bash's exported functions work), but normally people don't do this because shells don't accept %
in the name of a variable.
The sentence in the bash manual refers to the old (pre-Shellshock) behavior. It should be updated or removed. With modern bash versions, there is no ambiguity in the environment if you assume that environment variables won't have a name ending in %%
.
I do both depending on the circumstance. For most cases I will declare the functions in the same script that will be using them, however I do have a "toolkit" of scripts that has a file of variables and functions to be shared between all the scripts.
To Function or Not to Function
The main purpose of a function is DRY (Don't repeat yourself). If you have some code that will be used more than once in your script it should be in a function.
Using functions on non-repeating code is a matter of preference. Some people (myself included) consider it to be more neat. Additionally I think it closer resembles the way "real" programming languages are used.
Google's Shell Style Guide states that if your code contains at least one function you should use a "main" function as a wrapper for all your code. (Essentially your script will only be a series of function declarations ending with a single call to main
)
Declaring functions (or writing code) inside of a single script
- Simple (Anyone looking at the code will be able to easier track down what each function does)
- Increases portability (Only one file needs to be moved from system to system)
If you are writing a stand alone script to perform a single task this is probably the way to go.
Declaring functions inside of a separate file
- DRY (Don't repeat yourself)
- Can be easier to manage
If you are creating a set of tools that share a lot of common functions this is likely the way to go.
In my example I have functions that are used by all or at least most of the scripts in the toolkit. This includes some functions that query our inventory management system and return information about servers such as IP address, OS version, etc. This information is useful for many of the tools so it makes sense to declare such functions in one file instead of all of the files individually. Additionally we recently made changes to the inventory management system, so instead of having to change 10+ different files I only had to change the common file to query the new system, the other files still work properly without modification.
Some disadvantages of this is that it is more complex. Each file has the following statement to source this common file:
if [[ -f "${0%/*}/lib/common.sh" ]]; then
. "${0%/*}/lib/common.sh"
else
echo "Error! lib/common.sh not found!"
exit 1
fi
If users grab the toolkit and modify the directory structure or don't grab the entire directory structure the tools will not work as intended because they will be unable to source the common file.
Best Answer
This is about where you put the function definition. If you declare the function before it's called, you can call it even by variable. Try this: