1

I have following three classes and I am getting

Task not serialized

errors. Full stacktrace see below.

First class is a serialized Person:

public class Person implements Serializable 
  {
      private String name;
      private int age;

      public String getName() 
      {
        return name;
      }

      public void setAge(int age) 
      {
        this.age = age;
      }
    }

This class reads from the text file and maps to the person class:

public class sqlTestv2 implements Serializable

{

    private int appID;
    private int dataID;
    private JavaSparkContext sc;



    public JavaRDD<Person> getDataRDD()
    {

    JavaRDD<String> test = sc.textFile("hdfs://localhost:8020/user/cloudera/people.txt");

         JavaRDD<Person> people = test.map(
                    new Function<String, Person>() {
                    public Person call(String line) throws Exception {
                      String[] parts = line.split(",");

                      Person person = new Person();
                      person.setName(parts[0]);
                      person.setAge(Integer.parseInt(parts[1].trim()));

                      return person;
                    }
                  });


        return people;



    }

}

And this retrieves the RDD and performs operations on it:

public class sqlTestv1 implements Serializable

{

    public static void main(String[] arg) throws Exception 
    {
          SparkConf conf = new SparkConf().setMaster("local").setAppName("wordCount");
          JavaSparkContext sc = new JavaSparkContext(conf);
          SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
          sqlTestv2 v2=new sqlTestv2(1,1,sc);
          JavaRDD<Person> test=v2.getDataRDD();


          DataFrame schemaPeople = sqlContext.createDataFrame(test, Person.class);
          schemaPeople.registerTempTable("people");

          DataFrame df = sqlContext.sql("SELECT age FROM people");

          df.show();





    }

}

Full error:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:294) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:293) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.map(RDD.scala:293) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:90) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47) at com.oreilly.learningsparkexamples.mini.java.sqlTestv2.getDataRDD(sqlTestv2.java:54) at com.oreilly.learningsparkexamples.mini.java.sqlTestv1.main(sqlTestv1.java:41) Caused by: java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext Serialization stack: - object not serializable (class: org.apache.spark.api.java.JavaSparkContext, value: org.apache.spark.api.java.JavaSparkContext@3c3b144b) - field (class: com.oreilly.learningsparkexamples.mini.java.sqlTestv2, name: sc, type: class org.apache.spark.api.java.JavaSparkContext) - object (class com.oreilly.learningsparkexamples.mini.java.sqlTestv2, com.oreilly.learningsparkexamples.mini.java.sqlTestv2@3752fdda) - field (class: com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1, name: this$0, type: class com.oreilly.learningsparkexamples.mini.java.sqlTestv2) - object (class com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1, com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1@70c171ec) - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function) - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)

2 Answers 2

3

The stack tells you the answer. It's the JavaSparkContext that you're passing to sqlTestv2. You should pass the sc into the method, not the class

Sign up to request clarification or add additional context in comments.

Comments

1

You can add the "transient" modifier to sc, so that it is not serialized.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.