5

I have a column in spark dataframe of String datatype (with date in yyyy-MM-dd pattern) I want to display the column value in MM/dd/yyyy pattern

My data is

val df = sc.parallelize(Array(
  ("steak", "1990-01-01", "2000-01-01", 150),
  ("steak", "2000-01-02", "2001-01-13", 180),
  ("fish",  "1990-01-01", "2001-01-01", 100)
)).toDF("name", "startDate", "endDate", "price")

df.show()

+-----+----------+----------+-----+
| name| startDate|   endDate|price|
+-----+----------+----------+-----+
|steak|1990-01-01|2000-01-01|  150|
|steak|2000-01-02|2001-01-13|  180|
| fish|1990-01-01|2001-01-01|  100|
+-----+----------+----------+-----+

root
 |-- name: string (nullable = true)
 |-- startDate: string (nullable = true)
 |-- endDate: string (nullable = true)
 |-- price: integer (nullable = false)

I want to show endDate in MM/dd/yyyy pattern. All I am able to do is convert the column to DateType from String

val df2 = df.select($"endDate".cast(DateType).alias("endDate"))

df2.show()

+----------+
|   endDate|
+----------+
|2000-01-01|
|2001-01-13|
|2001-01-01|
+----------+

df2.printSchema()

root
 |-- endDate: date (nullable = true)

I want to show endDate in MM/dd/yyyy pattern. Only reference I found is this which doesn't solve the problem

3 Answers 3

8

You can use date_format function.

  import sqlContext.implicits._
  import org.apache.spark.sql.functions._

  val df = sc.parallelize(Array(
    ("steak", "1990-01-01", "2000-01-01", 150),
    ("steak", "2000-01-02", "2001-01-13", 180),
    ("fish", "1990-01-01", "2001-01-01", 100))).toDF("name", "startDate", "endDate", "price")

  df.show()

  df.select(date_format(col("endDate"), "MM/dd/yyyy")).show

Output :

+-------------------------------+
|date_format(endDate,MM/dd/yyyy)|
+-------------------------------+
|                     01/01/2000|
|                     01/13/2001|
|                     01/01/2001|
+-------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

1

Use pyspark.sql.functions.date_format(date, format):

val df2 = df.select(date_format("endDate", "MM/dd/yyyy").alias("endDate"))

Comments

0

Dataframe/Dataset having a string column with date value in it and we need to change the date format.

For the query asked, date format can be changed as below:

val df1 = df.withColumn("startDate1", date_format(to_date(col("startDate"),"yyyy-MM-dd"),"MM/dd/yyyy" ))

In Spark, the default date format is "yyyy-MM-dd" hence it can be re-written as

val df1 = df.withColumn("startDate1", date_format(col("startDate"),"MM/dd/yyyy" ))

(i) By applying to_date, we are changing the datatype of this column (string) to Date datatype. Also, we are informing to_date that the format in this string column is yyyy-MM-dd so read the column accordingly. (ii) Next, we are applying date_format to achieve the date format we require which is MM/dd/yyyy.

When time component is involved, use to_timestamp instead of to_date. Note that 'MM' represents month and 'mm' represents minutes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.