4

I have a shell script which calls a function that behaves differently based on the value of a global variable and whose output is a list of values that I want to store in an array.

I'm running into a problem because when I try to capture the output of the function using any variation of the obvious syntax:

mapfile -i the_array < <( the_function )

my global variable that's being set inside the_function reverts to it's previous value once the_function returns. I understand that this is a known "feature" of capturing the output of a function that has side-effects and I can work around it as shown below but I'd like to know:

  1. What's the rationale that went into bash to make this workaround necessary?
  2. Is this really the best way to get around the problem?

To simplify the problem, consider this case where I want the function to print 5 numbers the first time it's called and not print anything the next time it's called (this is the obvious syntax which doesn't produce the expected output):

$ cat tst1
#!/usr/bin/env bash

the_function() {
    printf '\nENTER: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2

    if (( the_variable == 0 )); then
        seq 5
        the_variable=1
    fi

    printf 'EXIT: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2
}

the_variable=0

mapfile -t arr < <( the_function )
declare -p arr

mapfile -t arr < <( the_function )
declare -p arr

$ ./tst1

ENTER: the_function(), the_variable=0
EXIT: the_function(), the_variable=1
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5")

ENTER: the_function(), the_variable=0
EXIT: the_function(), the_variable=1
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5")

That doesn't work for the reasons stated above and I can work around it by writing the code as (this one does produce the expected output):

$ cat tst2
#!/usr/bin/env bash

the_function() {
    local arr_ref=$1

    printf '\nENTER: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2

    if (( the_variable == 0 )); then
        mapfile -t "$arr_ref" < <( seq 5 )
        the_variable=1
    else
        mapfile -t "$arr_ref" < /dev/null
    fi

    printf 'EXIT: %s(), the_variable=%d\n' "${FUNCNAME[0]}" "$the_variable" >&2
}

the_variable=0

the_function arr
declare -p arr

the_function arr
declare -p arr

$ ./tst2

ENTER: the_function(), the_variable=0
EXIT: the_function(), the_variable=1
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5")

ENTER: the_function(), the_variable=1
EXIT: the_function(), the_variable=1
declare -a arr=()

but while that works it's obviously horrible code since it requires the lower level primitive to be more complicated than necessary and tightly coupled to the data structure being used to store it's output (so not reusable if a case arises where we just want the 5 numbers to go to stdout, for example).

So - why do I need to do that and is there a better way?

1 Answer 1

3

If you don't want parallelization (and thus a subshell whose scope gets lost), the alternative is buffering. Bash not doing that for you makes it explicit and visible that storage is being used, and where your data gets stored. So:

tempfile=$(mktemp "${TMPDIR:-/tmp}/the_function_output.XXXXXX")
the_function >"$tempfile"
mapfile -i the_array < "$tempfile"
rm -f -- "$tempfile"

To automate this kind of pattern, I'd suggest something like:

call_and_store_output() {
  local varname tempfile retval

  varname=$1 || return; shift
  tempfile=$(mktemp "${TMPDIR:-/tmp}/cso.XXXXXX") || return
  "$@" >"$tempfile"
  local retval=$?
  printf -v "$varname" %s "$(<"$tempfile")"
  rm -f -- "$tempfile"
  return "$retval"
}

...thereafter:

call_and_store_output function_output_var the_function
mapfile -i the_array <<<"$function_output_var"
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks Charles. Do you know what the rationale is for why x=$(y) causes y to run in a parallel subshell while just calling y on it's own and not capturing it's output in a variable apparently doesn't do so? Is there really no way to just run y and capture it's output in a variable without a subshell? Some kind of source y or . y equivalent syntax to run y in the current shell?
@EdMorton, ...how would you expect x=$(y) to be implemented if it didn't fork? Right now, you just have a FIFO as the inside's stdout, and the outside process is just responsible for reading from that FIFO until it hits EOF. Otherwise, you'd need to maintain two different sets of flow control / state, and would need to switch over to something similar to the FIFO mechanism whenever an external command is run on the inside track. Keep in mind that POSIX sh was designed long, long before pthreads.
I'd have expected it to be implemented the same way that calling a function y alone is implemented or invoking a shell script by . script is implemented but I'm not that familiar with the guts of shell implementation. I really was just looking for a concise, consistent way to call a function and save it's output in a variable (array in this case) regardless of what the code inside that function does. So far none of the options (implement a temp file or tightly couple the function to the caller) is great so I was hoping for something else I could do.
"Implement a temp file" is easy to push into a function; edited to demonstrate.
@EdMorton, ...the thing you have with x=$(somescript) that you don't have with . script is the need to capture and store all stdout contents. Something needs to actually be doing a read() from that FIFO. To be doing that call, it needs to be running -- it has to have an active thread of execution. Hence the unavoidability of a separate process unless we shunt content elsewhere to defer the reads.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.