0

I'm very new to Spark and Scala(Like two hours new), I'm trying to play with a CSV data file but I cannot do it as I'm not sure how to deal with "Header row", I have searched internet for the way to load it or to skip it but I don't really know how to do that. I'm pasting my code That I'm using, please help me.

object TaxiCaseOne{

case class NycTaxiData(Vendor_Id:String, PickUpdate:String, Droptime:String, PassengerCount:Int, Distance:Double, PickupLong:String, PickupLat:String, RateCode:Int, Flag:String, DropLong:String, DropLat:String, PaymentMode:String, Fare:Double, SurCharge:Double, Tax:Double, TripAmount:Double, Tolls:Double, TotalAmount:Double)

def mapper(line:String): NycTaxiData = {
val fields = line.split(',')  

val data:NycTaxiData = NycTaxiData(fields(0), fields(1), fields(2), fields(3).toInt, fields(4).toDouble, fields(5), fields(6), fields(7).toInt, fields(8), fields(9),fields(10),fields(11),fields(12).toDouble,fields(13).toDouble,fields(14).toDouble,fields(15).toDouble,fields(16).toDouble,fields(17).toDouble)
return data
}def main(args: Array[String]) {

// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
 // Use new SparkSession interface in Spark 2.0
val spark = SparkSession
  .builder
  .appName("SparkSQL")
  .master("local[*]")
  .config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
  .getOrCreate()
val lines = spark.sparkContext.textFile("../nyc.csv")

val data = lines.map(mapper)

// Infer the schema, and register the DataSet as a table.
import spark.implicits._
val schemaData = data.toDS

schemaData.printSchema()

schemaData.createOrReplaceTempView("data")

// SQL can be run over DataFrames that have been registered as a table
val vendor = spark.sql("SELECT * FROM data WHERE Vendor_Id == 'CMT'")

val results = teenagers.collect()

results.foreach(println)

spark.stop()
  }
}
1

1 Answer 1

0

If you have a CSV file you should use spark-csv to read the csv files rather than using textFile

val spark = SparkSession.builder().appName("test val spark = SparkSession
  .builder
  .appName("SparkSQL")
  .master("local[*]")
  .config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
  .getOrCreate()

val df = spark.read
        .format("csv")
        .option("header", "true") //This identifies first line as header
        .csv("../nyc.csv")

You need a spark-core and spark-sql dependency to work with this

Hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.