Scala sort and save csv - creating multiple csv files

Task - read a csv file, add 2 columns in lower case, sort & save the file. Problem - if sorting is applied, it is creating multiple files. Can someone please explain me what is happening here?

var df = spark.read
  .format("csv")
  .option("header", "true")
  .load(i_file)
  .select("Id", "Name", "Address")

df = df.withColumn("x_name", lower(col("Name")))
df = df.withColumn("x_address", lower(col("Address")))
df = df.orderBy("x_name") <---this line
df.write.option("header", "true").csv(o_file)

If I remove orderBy, it will create 1 file.

edited Sep 13, 2018 at 13:57

Xavier Guihot

62.8k26 gold badges320 silver badges202 bronze badges

asked Sep 13, 2018 at 13:56

Eyedia Tech

1451 silver badge11 bronze badges

hmm..may be it does not matter, let spark store these in partitioned file. That is my understanding!

Eyedia Tech
– Eyedia Tech

2018-09-13 13:59:01 +00:00
Commented Sep 13, 2018 at 13:59
Thanks @Dima, that answers my question, sorry for the duplicate, not sure why could not find that one!

Eyedia Tech
– Eyedia Tech

2018-09-13 15:53:29 +00:00
Commented Sep 13, 2018 at 15:53

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Scala sort and save csv - creating multiple csv files [duplicate]

0

Linked

Hot Network Questions