0

This works provided no null values exist in an array passed to a pyspark UDF.

concat_udf = udf(
    lambda con_str, arr: [x + con_str for x in arr], ArrayType(StringType())
)

I am not seeing how we can adapt this with a null / None check with an If. How to adapt the following correctly below that does not work:

concat_udf = udf(lambda con_str, arr: [  if x is None: 'XXX' else: x + con_str for x in arr  ], ArrayType(StringType()))

I can find no such example. if with transform no success either.

+----------+--------------+--------------------+
|      name|knownLanguages|          properties|
+----------+--------------+--------------------+
|     James| [Java, Scala]|[eye -> brown, ha...|
|   Michael|[Spark, Java,]|[eye ->, hair -> ...|
|    Robert|    [CSharp, ]|[eye -> , hair ->...|
|Washington|          null|                null|
| Jefferson|        [1, 2]|                  []|
+----------+--------------+--------------------+

should become

+----------+--------------------+-----------------------+
|      name|knownLanguages|          properties         |
+----------+--------------------+-----------------------+
|     James| [JavaXXX, ScalaXXX]|[eye -> brown, ha...   |
|   Michael|[SparkXXX, JavaXXX,XXX]|[eye ->, hair -> ...|
|    Robert|    [CSharpXXX, XXX]|[eye -> , hair ->...   |
|Washington|                 XXX|                null   |
| Jefferson|        [1XXX, 2XXX]|                  []   |
+----------+--------------+-----------------------------+
1

1 Answer 1

1

using ternary operator, I would do something like this :

concat_udf = udf(
    lambda con_str, arr: [x + con_str if x is not None else "XXX" for x in arr]
    if arr is not None
    else ["XXX"],
    ArrayType(StringType()),
)

# OR 

concat_udf = udf(
    lambda con_str, arr: [
        x + con_str if x is not None else "XXX" for x in arr or [None]
    ],
    ArrayType(StringType()),
)
Sign up to request clarification or add additional context in comments.

5 Comments

PythonException: An exception was thrown from a UDF: 'TypeError: 'NoneType' object is not iterable'
@thebluephantom which one supposed to be None ? the array or the content of the array ? This error means arr is None. But you wrote a test case on x in your question.
missing the else on 2nd one. very finnicky this udf with pyspark
Edited, that should clarify.
@thebluephantom Check the edit. It is always the same logic.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.