How do I populate an array with the output of a function that sets a global variable?

Question

I have a shell script which calls a function that behaves differently based on the value of a global variable and whose output is a list of values that I want to store in an array.

I'm running into a problem because when I try to capture the output of the function using any variation of the obvious syntax:

mapfile -i the_array < <( the_function )

my global variable that's being set inside the_function reverts to it's previous value once the_function returns. I understand that this is a known "feature" of capturing the output of a function that has side-effects and I can work around it as shown below but I'd like to know:

What's the rationale that went into bash to make this workaround necessary?
Is this really the best way to get around the problem?

To simplify the problem, consider this case where I want the function to print 5 numbers the first time it's called and not print anything the next time it's called (this is the obvious syntax which doesn't produce the expected output):

$ cat tst1
#!/usr/bin/env bash

the_function() {
    printf '\nENTER: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2

    if (( the_variable == 0 )); then
        seq 5
        the_variable=1
    fi

    printf 'EXIT: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2
}

the_variable=0

mapfile -t arr < <( the_function )
declare -p arr

mapfile -t arr < <( the_function )
declare -p arr

$ ./tst1

ENTER: the_function(), the_variable=0
EXIT: the_function(), the_variable=1
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5")

ENTER: the_function(), the_variable=0
EXIT: the_function(), the_variable=1
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5")

That doesn't work for the reasons stated above and I can work around it by writing the code as (this one does produce the expected output):

$ cat tst2
#!/usr/bin/env bash

the_function() {
    local arr_ref=$1

    printf '\nENTER: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2

    if (( the_variable == 0 )); then
        mapfile -t "$arr_ref" < <( seq 5 )
        the_variable=1
    else
        mapfile -t "$arr_ref" < /dev/null
    fi

    printf 'EXIT: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2
}

the_variable=0

the_function arr
declare -p arr

the_function arr
declare -p arr

$ ./tst2

ENTER: the_function(), the_variable=0
EXIT: the_function(), the_variable=1
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5")

ENTER: the_function(), the_variable=1
EXIT: the_function(), the_variable=1
declare -a arr=()

but while that works it's obviously horrible code since it requires the lower level primitive to be more complicated than necessary and tightly coupled to the data structure being used to store it's output (so not reusable if a case arises where we just want the 5 numbers to go to stdout, for example).

So - why do I need to do that and is there a better way?

Charles Duffy · Accepted Answer · 2018-02-19 02:09:35Z

3

If you don't want parallelization (and thus a subshell whose scope gets lost), the alternative is buffering. Bash not doing that for you makes it explicit and visible that storage is being used, and where your data gets stored. So:

tempfile=$(mktemp "${TMPDIR:-/tmp}/the_function_output.XXXXXX")
the_function >"$tempfile"
mapfile -i the_array < "$tempfile"
rm -f -- "$tempfile"

To automate this kind of pattern, I'd suggest something like:

call_and_store_output() {
  local varname tempfile retval

  varname=$1 || return; shift
  tempfile=$(mktemp "${TMPDIR:-/tmp}/cso.XXXXXX") || return
  "$@" >"$tempfile"
  local retval=$?
  printf -v "$varname" %s "$(<"$tempfile")"
  rm -f -- "$tempfile"
  return "$retval"
}

...thereafter:

call_and_store_output function_output_var the_function
mapfile -i the_array <<<"$function_output_var"

edited Feb 19, 2018 at 2:09

answered Feb 15, 2018 at 15:06

Charles Duffy

299k43 gold badges441 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Ed Morton Over a year ago

Thanks Charles. Do you know what the rationale is for why x=$(y) causes y to run in a parallel subshell while just calling y on it's own and not capturing it's output in a variable apparently doesn't do so? Is there really no way to just run y and capture it's output in a variable without a subshell? Some kind of source y or . y equivalent syntax to run y in the current shell?

Charles Duffy Over a year ago

@EdMorton, ...how would you expect x=$(y) to be implemented if it didn't fork? Right now, you just have a FIFO as the inside's stdout, and the outside process is just responsible for reading from that FIFO until it hits EOF. Otherwise, you'd need to maintain two different sets of flow control / state, and would need to switch over to something similar to the FIFO mechanism whenever an external command is run on the inside track. Keep in mind that POSIX sh was designed long, long before pthreads.

Ed Morton Over a year ago

I'd have expected it to be implemented the same way that calling a function y alone is implemented or invoking a shell script by . script is implemented but I'm not that familiar with the guts of shell implementation. I really was just looking for a concise, consistent way to call a function and save it's output in a variable (array in this case) regardless of what the code inside that function does. So far none of the options (implement a temp file or tightly couple the function to the caller) is great so I was hoping for something else I could do.

Charles Duffy Over a year ago

"Implement a temp file" is easy to push into a function; edited to demonstrate.

Charles Duffy Over a year ago

@EdMorton, ...the thing you have with x=$(somescript) that you don't have with . script is the need to capture and store all stdout contents. Something needs to actually be doing a read() from that FIFO. To be doing that call, it needs to be running -- it has to have an active thread of execution. Hence the unavoidability of a separate process unless we shunt content elsewhere to defer the reads.

|

Collectives™ on Stack Overflow

How do I populate an array with the output of a function that sets a global variable?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related