What's the best (fastest) way to apply an UDF only when a value is not null or not an empty string.
I've added a simple example.
df = spark.createDataFrame(
[["John Jones"], ["Tracey Smith"], [None], ["Amy Sanders"], [""]]
).toDF("Name")
def upperCase(str):
return str.upper()
upperCaseUDF = udf(lambda z: upperCase(z), StringType())
df.withColumn(
"Cureated Name",
F.when(
((F.col("Name").isNotNull()) | (F.trim(F.col("name")) != "")),
upperCaseUDF(F.col("Name")),
),
)
AttributeError: 'NoneType' object has no attribute 'upper'.
I don't think the when clause works properly (or at least not as I would expect).
I get an error for the Null value.
I expect the UDF not to be executed on a Null value.
It's not about solving the Null value, but why the when clause doesn't work as I would expect !
upperCaseUDF = udf(lambda z:upperCase(z),StringType())''', since you have an attribute with a value ofNone. NoneType has no attribute '''upper()''. You can fix this easily by updating the function ```upperCaseto detect a None value and return something, else return value.upper()whenresult.