Python / Spark cast multiple variables - columns as double type

Question

I am translating Scala / Spark deep learning model into Python / PySpark. After reading the df all variables are interpreted as strings type. I need to cast them as float. Doing this one by one is easy, I think it would be like this:

format_number(result['V1'].cast('float'),2).alias('V1')

, but there is 31 columns How to do it all at once. The columns are "V1" to "V28" and "Time", "Amount", "Class"

Scala solution to it is this:

// cast all the column to Double type.
val df = raw.select(((1 to 28).map(i => "V" + i) ++ Array("Time", "Amount", "Class")).map(s => col(s).cast("Double")): _*)

https://github.com/intel-analytics/analytics-zoo/blob/master/apps/fraudDetection/Fraud%20Detction.ipynb

How to do the same in PySpark?

Alper t. Turker · Accepted Answer · 2017-12-09 03:29:22Z

3

Use comprehensions:

result.select([
    format_number(result[c].cast('float'),2).alias(c) for c in result.columns
])

answered Dec 9, 2017 at 3:29

Alper t. Turker

35.3k9 gold badges89 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python / Spark cast multiple variables - columns as double type

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related