I have a TSV file, where the first line is the header. I want to create a JavaPairRDD from this file. Currently, I'm doing so with the following code:
TsvParser tsvParser = new TsvParser(new TsvParserSettings());
List<String[]> allRows;
List<String> headerRow;
try (BufferedReader reader = new BufferedReader(new FileReader(myFile))) {
allRows = tsvParser.parseAll((reader));
//Removes the header row
headerRow = Arrays.asList(allRows.remove(0));
}
JavaPairRDD<String, MyObject> myObjectRDD = javaSparkContext
.parallelize(allRows)
.mapToPair(row -> new Tuple2<>(row[0], myObjectFromArray(row)));
I was wondering if there was a way to have the javaSparkContext read and process the file directly instead of splitting the operation into two parts.
EDIT: This is not a duplicate of How do I convert csv file to rdd, because I'm looking for an answer in Java, not Scala.