Query an Array column with an indices column

Question

So I have two columns

| col_arr | col_ind | 
|[1, 2, 3]| [0, 2]  |
|[5, 1]   | [1]     |

and I'd like my result to be an extraction of the values in col_arr by col_ind resulting in col_val below:

| col_arr | col_ind | col_val |
|[1, 2, 3]| [0, 2]  | [1, 3]  |
|[5, 1]   | [1]     | [1]     |

What would be the most elegant and efficient way to do this?

My first idea is to use a UDF, but it feels like an overkill

@udf
def sub_select(arr, inds):
    if (arr is not None) and (inds is not None):
        return [arr[ind] for ind in inds]

I was also thinking about the array_position function dynamically with expr. It is not clear to me how to do so in a way that is flexible to the length of col_ind.

F.expr("array_position(col_arr, array_position(col_ind, 0))")

Simplifications and complications:

Assume the length of indices to query is bounded by a small number (ex. the max len of a col_ind list is 5).
What if I have several col_arr columns (col_arr1, col_arr2, col_arr3), and only one col_ind?

Shubham Sharma · Accepted Answer · 2023-11-07 16:55:01Z

2

Apply a transformation function on col_ind and for each index return the corresponding value from col_arr

df = df.withColumn('col_val', F.expr("transform(col_ind, i -> col_arr[i])"))

# df.show()
# +---------+-------+-------+
# |  col_arr|col_ind|col_val|
# +---------+-------+-------+
# |[1, 2, 3]| [0, 2]| [1, 3]|
# |   [5, 1]|    [1]|    [1]|
# +---------+-------+-------+

Bonus: What if I have several col_arr columns (col_arr1, col_arr2, col_arr3), and only one col_ind? Same approach can be used for each column seperately

answered Nov 7, 2023 at 16:55

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Query an Array column with an indices column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related