0

i have a table that contains a column of numbers like (959, 1189...) when i check the column type i find it string type so i changed the type of column to integer type the problem is that when the column becomes integer type it shows null values that doesn't existed before instead of other values ( every number > 999 , for exemple 1232) this is how i'am changing the data type any help? : ```

from pyspark.sql.types import (
    IntegerType
)
dfnumber2 = dfnumber \
  .withColumn("Offres d'emploi" ,
              dfnumber["Offres d'emploi"]
              .cast(IntegerType()))   \
 
  
dfnumber2.printSchema()
1
  • there could be some values that are comma separated (e.g., 300 and 3,000). instead of overwriting the column, create a new column and filter a few records where the new column is null - then check what the actual values were in the input dataframe. you could also try using bigint or double datatypes. if the column does contain commas, remove them before casting. Commented Aug 17, 2022 at 13:39

1 Answer 1

0

The values are too big for the int type so PySpark is trimming, perhaps try to cast it to double type

from pyspark.sql.types import (
    DoubleType
)
dfnumber2 = dfnumber \
  .withColumn("Offres d'emploi" ,
              dfnumber["Offres d'emploi"]
              .cast(DoubleType()))   \
 
  
dfnumber2.printSchema()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.