reduce(func)

Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one). The function should be associative and commutative so that it can be computed in parallel.

/*

find a total of a file stream such as below:

x.txt
1 2 3 4 5
6 7 8 9 10

It will print:

55

*/


import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql
  .{Row, SaveMode, SparkSession}
import org.apache.spark.sql.SQLContext
import org.apache.log4j.{Level, Logger}
import org.apache.spark.streaming
  .{Seconds, StreamingContext}
Logger.getLogger("org").setLevel(Level.ERROR)   
val spark = SparkSession
          .builder()
          .config("spark.master", "local[2]")
          .appName("streaming for book")
          .getOrCreate()
import spark.implicits._
val sc=spark.sparkContext
val ssc = new StreamingContext(sc, Seconds(1))
val dataDirectory="/tmp/filestream/"
val lines=ssc.textFileStream(dataDirectory)
val nums = lines.flatMap(_.split(" "))
  .filter(_.nonEmpty).map(x=>x.toInt)
val total = nums.reduce((x,y)=>x+y)
total.print()
ssc.start()
ssc.awaitTermination()

Previousunion(otherStream)Nextcount()

Last updated 5 years ago

Was this helpful?