1

I am trying to load a csv file in scala from spark. I see that we can do using the below two different syntaxes:

  sqlContext.read.format("csv").options(option).load(path)
  sqlContext.read.options(option).csv(path)

What is the difference between these two and which gives the better performance? Thanks

1 Answer 1

3

There's no difference.

So why do both exist?

  • The .format(fmt).load(path) method is a flexible, pluggable API that allows adding more formats without having to re-compile spark - you can register aliases for custom Data Source implementations and have Spark use them; "csv" used to be such a custom implementation (outside of the packaged Spark binaries), but it is now part of the project
  • There are shorthand methods for "built-in" data sources (like csv, parquet, json...) which make the code a bit simpler (and verified at compile time)

Eventually, they both create a CSV Data Source and use it to load the data.

Bottom line, for any supported format, you should opt for the "shorthand" method, e.g. csv(path).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.