0

I'm trying to create a new column in a pyspark dataframe that is predicated on the contents of another column. The other column has all integers, and I want the new column to be encoded with either 1's or 0's.

import pyspark.sql.functions as F
df2 = df2.withColumn('Industrial', F.when(F.col('CODE') in (1,2,3,4), 1).otherwise(0))

This doesn't work since it wants just Boolean logic. Is there a work around for this?

EDIT: Could still be useful for others since it creates a new column and does a little more than just a check of isin().

1

1 Answer 1

1

Use col.isin method

df2 = df2.withColumn('Industrial', F.when(F.col('CODE').isin((1,2,3,4)), 1).otherwise(0))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.