10

I have an input dataframe(ip_df), data in this dataframe looks like as below:

id            col_value
1               10
2               11
3               12

Data type of id and col_value is String

I need to get another dataframe(output_df), having datatype of id as string and col_value column as decimal**(15,4)**. THere is no data transformation, just data type conversion. Can i use it using PySpark. Any help will be appreciated

3 Answers 3

12

Try using the cast method:

from pyspark.sql.types import DecimalType
<your code>
output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast(DecimalType()))
Sign up to request clarification or add additional context in comments.

3 Comments

It is giving error-name 'DecimalType' is not defined
You need to import it
from pyspark.sql.types import DecimalType
5

try below statement.

output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast('float'))

Comments

1

You can change multiple column types

  • Using withColumn() -
from pyspark.sql.types import DecimalType, StringType

output_df = ip_df \
  .withColumn("col_value", ip_df["col_value"].cast(DecimalType())) \
  .withColumn("id", ip_df["id"].cast(StringType()))
  • Using select()
from pyspark.sql.types import DecimalType, StringType

output_df = ip_df.select(
  (ip_df.id.cast(StringType())).alias('id'),
  (ip_df.col_value.cast(DecimalType())).alias('col_value')
)
  • Using spark.sql()
ip_df.createOrReplaceTempView("ip_df_view")

output_df = spark.sql('''
SELECT 
    STRING(id),
    DECIMAL(col_value)
FROM ip_df_view;
''')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.