repartition(numPartitions)

repartition(numPartitions)

Changes the level of parallelism in this DStream by creating more or fewer partitions. By default, Spark set the number of partitions to number of CPU cores (threads) in the Spark machine for parallelism. Each partition can be processed by one CPU core (thread)
Example:
1
lines.getClass
2
//res15: Class[_ <: org.apache.spark.streaming.dstream.DStream[String]] = class org.apache.spark.streaming.dstream.MappedDStream
3
lines.repartition(3)
4
//res16: org.apache.spark.streaming.dstream.DStream[String] = [email protected]
Copied!
Last modified 1yr ago
Copy link