This works provided no null values exist in an array passed to a pyspark UDF.
concat_udf = udf(
lambda con_str, arr: [x + con_str for x in arr], ArrayType(StringType())
)
I am not seeing how we can adapt this with a null / None check with an If. How to adapt the following correctly below that does not work:
concat_udf = udf(lambda con_str, arr: [ if x is None: 'XXX' else: x + con_str for x in arr ], ArrayType(StringType()))
I can find no such example. if with transform no success either.
+----------+--------------+--------------------+
| name|knownLanguages| properties|
+----------+--------------+--------------------+
| James| [Java, Scala]|[eye -> brown, ha...|
| Michael|[Spark, Java,]|[eye ->, hair -> ...|
| Robert| [CSharp, ]|[eye -> , hair ->...|
|Washington| null| null|
| Jefferson| [1, 2]| []|
+----------+--------------+--------------------+
should become
+----------+--------------------+-----------------------+
| name|knownLanguages| properties |
+----------+--------------------+-----------------------+
| James| [JavaXXX, ScalaXXX]|[eye -> brown, ha... |
| Michael|[SparkXXX, JavaXXX,XXX]|[eye ->, hair -> ...|
| Robert| [CSharpXXX, XXX]|[eye -> , hair ->... |
|Washington| XXX| null |
| Jefferson| [1XXX, 2XXX]| [] |
+----------+--------------+-----------------------------+