reduce(func)

reduce(func)

Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one). The function should be associative and commutative so that it can be computed in parallel.
1
/*
2
​
3
find a total of a file stream such as below:
4
​
5
x.txt
6
1 2 3 4 5
7
6 7 8 9 10
8
​
9
It will print:
10
​
11
55
12
​
13
*/
14
​
15
​
16
import org.apache.spark._
17
import org.apache.spark.SparkContext._
18
import org.apache.spark.sql
19
.{Row, SaveMode, SparkSession}
20
import org.apache.spark.sql.SQLContext
21
import org.apache.log4j.{Level, Logger}
22
import org.apache.spark.streaming
23
.{Seconds, StreamingContext}
24
Logger.getLogger("org").setLevel(Level.ERROR)
25
val spark = SparkSession
26
.builder()
27
.config("spark.master", "local[2]")
28
.appName("streaming for book")
29
.getOrCreate()
30
import spark.implicits._
31
val sc=spark.sparkContext
32
val ssc = new StreamingContext(sc, Seconds(1))
33
val dataDirectory="/tmp/filestream/"
34
val lines=ssc.textFileStream(dataDirectory)
35
val nums = lines.flatMap(_.split(" "))
36
.filter(_.nonEmpty).map(x=>x.toInt)
37
val total = nums.reduce((x,y)=>x+y)
38
total.print()
39
ssc.start()
40
ssc.awaitTermination()
Copied!
​
Last modified 1yr ago
Copy link