null values while changing data type from string type to integer type in Databricks

Question

i have a table that contains a column of numbers like (959, 1189...) when i check the column type i find it string type so i changed the type of column to integer type the problem is that when the column becomes integer type it shows null values that doesn't existed before instead of other values ( every number > 999 , for exemple 1232) this is how i'am changing the data type any help? : ```

from pyspark.sql.types import (
    IntegerType
)
dfnumber2 = dfnumber \
  .withColumn("Offres d'emploi" ,
              dfnumber["Offres d'emploi"]
              .cast(IntegerType()))   \
 
  
dfnumber2.printSchema()

there could be some values that are comma separated (e.g., 300 and 3,000). instead of overwriting the column, create a new column and filter a few records where the new column is null - then check what the actual values were in the input dataframe. you could also try using bigint or double datatypes. if the column does contain commas, remove them before casting. — samkart
– samkart, Commented Aug 17, 2022 at 13:39

Yefet · Accepted Answer · 2022-08-17 13:37:03Z

0

The values are too big for the int type so PySpark is trimming, perhaps try to cast it to double type

from pyspark.sql.types import (
    DoubleType
)
dfnumber2 = dfnumber \
  .withColumn("Offres d'emploi" ,
              dfnumber["Offres d'emploi"]
              .cast(DoubleType()))   \
 
  
dfnumber2.printSchema()

answered Aug 17, 2022 at 13:37

Yefet

2,1162 gold badges12 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

null values while changing data type from string type to integer type in Databricks

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related