Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one). The function should be associative and commutative so that it can be computed in parallel.
/*
find a total of a file stream such as below:
x.txt
1 2 3 4 5
6 7 8 9 10
It will print:
55
*/
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql
.{Row, SaveMode, SparkSession}
import org.apache.spark.sql.SQLContext
import org.apache.log4j.{Level, Logger}
import org.apache.spark.streaming
.{Seconds, StreamingContext}
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession
.builder()
.config("spark.master", "local[2]")
.appName("streaming for book")
.getOrCreate()
import spark.implicits._
val sc=spark.sparkContext
val ssc = new StreamingContext(sc, Seconds(1))
val dataDirectory="/tmp/filestream/"
val lines=ssc.textFileStream(dataDirectory)
val nums = lines.flatMap(_.split(" "))
.filter(_.nonEmpty).map(x=>x.toInt)
val total = nums.reduce((x,y)=>x+y)
total.print()
ssc.start()
ssc.awaitTermination()