Resilient Distributed Datasets (RDDs)
val rdd=sc.parallelize(Seq(1,2,3,4,5))val rddFromFile = spark.sparkContext.textFile("file:///home/dv6/spark/spark/data/graphx/followers.txt")val rdd=sc.parallelize(Seq("jack,student","mary,instructor","ann,researcher"))
rdd.collect.foreach(println)
/*
jack,student
mary,instructor
ann,researcher
*/
rdd.map(x=>x.split(",")(0)).collect
//res10: Array[String] = Array(jack, mary, ann)Last updated