Bash – Write bash function which operates on list of filenames

bashforfunctionscp

I want to define the function cpfromserver in bash so that when I run

$ cpfromserver xxx yyy zzz

the result is the same as if I had typed

$ scp user@remote.server:"/some/location/xxx/xxx.txt /some/location/xxx/xxx.pdf /some/location/yyy/yyy.txt /some/location/yyy/yyy.pdf /some/location/zzz/zzz.txt /some/location/zzz/zzz.pdf" /somewhere/else/

where it works for any number of arguments.

(That is, the function should copy filename.txt and filename.pdf from the directory /some/location/filename/ on the remote.server to the local directory /somewhere/else/ for every filename I specify as an argument to the function. And do it all in a single ssh connection.)

Currently, I have written a function that works for a single argument, and I just loop over it, but this establishes separate ssh connections for each argument, which is undesirable.

My difficulty is that I only know how to use function arguments individually by their position ($1, $2, etc.) — not how to manipulate the whole list.

[Note that I am writing this function as a convenience tool for my own use only, and so I would prioritize my own ease of understanding over handling pathological cases like filenames with quotation marks or linebreaks in them and whatnot. I know that the filenames I will be using this with are well-behaved.]

Best Answer

Try this way:

cpfromserver () {
    files=''
    for x in "$@"
    do
        files="$files /some/location/$x/$x.txt /some/location/$x/$x.pdf"
    done
    scp user@remote.server:"$files" /somewhere/else/
}

Important caveat from comments: "It's worth noting for posterity that this solution definitely won't work for complicated filenames. If a filename contains a space, or a newline, or quotes, this approach will definitely fail."

Example

Say I have the following sample directory.

$ tree
.
|-- dir1
|   `-- a\ file1.txt
|-- dir2
|   `-- a\ file2.txt
|-- dir3
|   `-- a\ file3.txt
`-- myscript

3 directories, 4 files

Now let's say I have this for ./myscript.

#!/bin/bash

for i in "$@"; do
    echo "file: $i"
done

Now when I run the following command.

$ find . -type f -print0 | xargs -r0 ./myscript 
file: ./dir2/a file2.txt
file: ./dir3/a file3.txt
file: ./dir1/a file1.txt
file: ./myscript

Or when I use the 2nd form like so:

$ find . -type f -exec ./myscript {} +
file: ./dir2/a file2.txt
file: ./dir3/a file3.txt
file: ./dir1/a file1.txt
file: ./myscript

Details

find + xargs

The above 2 methods, though looking different, are essentially the same. The first is taking the output from find, splitting it using NULLs (\0) via the -print0 switch to find. The xargs -0 is specifically designed to take input that's split using NULLs. That non-standard syntax was introduced by GNU find and xargs but is also found nowadays in a few others like most recent BSDs. The -r option is required to avoid calling myscript if find finds nothing with GNU find but not with BSDs.

NOTE: This entire approach hinges on the fact that you'll never pass a string that's exceedingly long. If it is, then a 2nd invocation of ./myscript will get kicked off with the remainder of subsequent results from find.

find with +

That's the standard way (though it was only added relatively recently (2005) to the GNU implementation of find). The ability to do what we're doing with xargs is literally built into find. So find will find a list of files and then pass that list as as many arguments as can fit to the command specified after -exec (note that {} can only be last just before + in this case), running the commands several times if needed.

Why no quoting?

In the first example we're taking a shortcut by completely avoiding the issues with the quoting, by using NULLs to separate the arguments. When xargs is given this list it's instructed to split on the NULLs effectively protecting our individual command atoms.

In the second example we're keeping the results internal to find and so it knows what each file atom is, and will guarantee to handle them appropriately, thereby avoiding the whoie business of quoting them.

Maximum size of command line?

This question comes up from time to time so as a bonus I'm adding it to this answer, mainly so I can find it in the future. You can use xargs to see what the environment's limit like so:

$ xargs --show-limits
Your environment variables take up 4791 bytes
POSIX upper limit on argument length (this system): 2090313
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2085522
Size of command buffer we are actually using: 131072

Bash Function Decorator

That would be a lot easier with zsh that has anonymous functions and a special associative array with function codes. With bash however you could do something like:

decorate() {
  eval "
    _inner_$(typeset -f "$1")
    $1"'() {
      echo >&2 "Calling function '"$1"' with $# arguments"
      _inner_'"$1"' "$@"
      local ret=$?
      echo >&2 "Function '"$1"' returned with exit status $ret"
      return "$ret"
    }'
}

f() {
  echo test
  return 12
}
decorate f
f a b

Which would output:

Calling function f with 2 arguments
test
Function f returned with exit status 12

You can't call decorate twice to decorate your function twice though.

With zsh:

decorate()
  functions[$1]='
    echo >&2 "Calling function '$1' with $# arguments"
    () { '$functions[$1]'; } "$@"
    local ret=$?
    echo >&2 "function '$1' returned with status $ret"
    return $ret'

Best Answer

Related Solutions

Bash – Using a generated list of filenames as argument list — with spaces

Example

Details

Why no quoting?

Maximum size of command line?

Bash Function Decorator

Related Question