1

How to read CSV files directly into spark DataFrames without using databricks csv api ?
I know there is databricks csv api but i cant use it that api..
I know there is case class to use and map the cols according to cols(0) positions but the problem is i have more than 22 coloumns hence i cant use case class because in case class we have limitation of using only 22 coloumns. I know there is structtype to define schema but i feel it would be very lenghty code to define 40 coloumns in structype. I am looking for something to read into dataframe using read method but in spark we dont have direct support for csv file we need to parse it ? but how if we have more than 40 cols.?

3
  • what is wrong with databricks csv api ? Commented Jul 5, 2016 at 5:19
  • @Himaprasoon , nothing wrong with databricks csv api ..actually i have to write a certification hortonworks hdpcd spark ,in exam they dont provide databricks api ..only spark inbuilt api we can use... Commented Jul 5, 2016 at 18:38
  • was my answer helpful? if not what have you found if there is anything else? Commented Oct 2, 2016 at 8:05

2 Answers 2

0

I've also looked into this and ended up writing a python script to generate scala code for the parse(line) function and the definition of the schema. Yes, this may become a lenghty blob of code.

Another path you may walk if your data is not too big: use python pandas! Startup py-spark, read your data into a pandas dataframe, and then create a spark dataframe from that. Save it (eg. as a parquet file). And load that parquet file in scala-spark.

Sign up to request clarification or add additional context in comments.

Comments

0

Seems like scala 2.11.x onwards the arity limit issue was fixed. please have a look at https://issues.scala-lang.org/browse/SI-7296

To overcome this in <2.11 see my answer, which uses extends Product and overrides methods productArity, productElement,canEqual (that:Any)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.