Gawk: Passing arrays to functions

arrayawkfunction

Stuck with GNU awk 3.1.6 and think I've worked around its array bugs but still have what looks like a scope problem in a 600-line awk program. Need to verify understanding of array scope in awk to find my bug.

Given this illustrative awk code…

function foo(ga) {
  ga[1] = "global result"
}

garray[1] = "global"
foo(garray)

print garray[1]

will print…

global result

Since arrays are always passed to functions by reference, then all arrays are always global. There is no way to create a local array. Is this correct? Have been unable to find docs that explicitly say that.

Since I'm debugging, and 3.1.6 itself has known bugs in this area, am trying to determine where awk's bugs leave off and mine begin.

Supplemental: Why does ga[] work inside the function then?

First of all, passing the array to the function with foo(ga) is actually unnecessary. Just access it as garray[] inside the function. There's no measurable performance penalty in doing it however, and it helps in debugging and error reporting.

In using foo(ga), ga[] is a synonym for the global array garray[]. Instead of being a local copy of garray[], it is simply a pointer to garray[], rather like a symbolic link is a pointer to a file and thus the same file (or array) can be accessed under more than one name.

Supplemental: Clarification of Glenn Jackman's answer

While arrays created outside a function are global to the function and may be passed to it or just referenced within it, arrays created inside a function do indeed remain local to the function and not visible outside it. Modifying Mr. Jackman's example illustrates this…

awk '
    function bar(x,y) {
      split("hello world", y)
      print "x[1] inside: " x[1]
      print "y[1] inside: " y[1]
    }
    BEGIN {
      x[1]="goodbye"
      print "x[1] before: " x[1]
      print "y[1] before: " y[1]
      bar(x)
      print "x[1] after: " x[1]
      print "y[1] after: " y[1]
    }
'
x[1] before: goodbye
y[1] before: 
x[1] inside: goodbye
y[1] inside: hello
x[1] after: goodbye
y[1] after: 

Note that we are only passing the x[] array (actually, just a pointer to it) to bar(). The y[] array doesn't even exist until we get inside the function.

However, if we declare y[] by including it in the bar() argument list without assigning anything to it outside the function, it becomes visible after calling bar(x,y)

awk '
    function bar(x,y) {
      split("hello world", y)
      print "x[1] inside: " x[1]
      print "y[1] inside: " y[1]
    }
    BEGIN {
      x[1]="goodbye"
      print "x[1] before: " x[1]
      print "y[1] before: " y[1]
      bar(x,y)
      print "x[1] after: " x[1]
      print "y[1] after: " y[1]
    }
'
x[1] before: goodbye
y[1] before: 
x[1] inside: goodbye
y[1] inside: hello
x[1] after: goodbye
y[1] after: hello

Finally, if we create the y[] array outside the function and pass it with bar(x,y), the split() assignment inside the function replaces that array's elements…

awk '
    function bar(x,y) {
      split("hello world", y)
      print "x[1] inside: " x[1]
      print "y[1] inside: " y[1]
    }
    BEGIN {
      x[1]="goodbye"
      y[1]="howdy"
      print "x[1] before: " x[1]
      print "y[1] before: " y[1]
      bar(x,y)
      print "x[1] after: " x[1]
      print "y[1] after: " y[1]
    }
'
x[1] before: goodbye
y[1] before: howdy
x[1] inside: goodbye
y[1] inside: hello
x[1] after: goodbye
y[1] after: hello

Best Answer

Function parameters are local to the function.

awk '
    function foo(x,y) {y=x*x; print "y in function: "y} 
    BEGIN {foo(2); print "y out of function: " y}
'
y in function: 4
y out of function: 

If you pass fewer values to a function than there are parameters, the extra parameters are just empty. You might sometimes see functions defined like

function foo(a, b, c            d, e, f) {...

where the parameters after the whitespace are local variables and are not intended to take a value at invocation.

No reason why this can't work for local arrays:

awk '
    function bar(x) {
        split("hello world", x)
        print "in: " x[1]
    }
    BEGIN {
        x[1]="world"
        bar()
        print "out: " x[1]}
'
in: hello
out: world
Related Question