11

I am new to Scala. How can I read a file from HDFS using Scala (not using Spark)? When I googled it I only found writing option to HDFS.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.PrintWriter;

/**
* @author ${user.name}
*/
object App {

//def foo(x : Array[String]) = x.foldLeft("")((a,b) => a + b)

def main(args : Array[String]) {
println( "Trying to write to HDFS..." )
val conf = new Configuration()
//conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020")
conf.set("fs.defaultFS", "hdfs://192.168.30.147:8020")
val fs= FileSystem.get(conf)
val output = fs.create(new Path("/tmp/mySample.txt"))
val writer = new PrintWriter(output)
try {
    writer.write("this is a test") 
    writer.write("\n")
}
finally {
    writer.close()
    println("Closed!")
}
println("Done!")
}

}

Please help me.How can read the file or load file from HDFS using scala.

3
  • What did you try so far, e.g. with hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/… ? Commented Jan 11, 2017 at 10:17
  • hard to follow docs here imho Commented Jan 21, 2018 at 17:49
  • We have elected for small files to copy from HDFS to local file system and process there SEQUENTIALLY. Commented Jan 21, 2018 at 18:12

1 Answer 1

22

One of the ways (kinda in functional style) could be like this:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI
import scala.collection.immutable.Stream

val hdfs = FileSystem.get(new URI("hdfs://yourUrl:port/"), new Configuration()) 
val path = new Path("/path/to/file/")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))

//This example checks line for null and prints every existing line consequentally
readLines.takeWhile(_ != null).foreach(line => println(line))

Also you could take a look this article or here and here, these questions look related to yours and contain working (but more Java-like) code examples if you're interested.

Sign up to request clarification or add additional context in comments.

3 Comments

What is URI? How do I import this?
I got this working by importing URI as import java.net.URI and setting it to hdfs:/// as my Scala service is running in the same node as the HDFS nameserver host.
The method readLine is now deprecated, you should use val bufferedReader = new BufferedReader(new InputStreamReader(stream)) on top of the stream object (source and example of use: hadoopandspark.wordpress.com/2018/04/23/… )

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.