repartition(numPartitions)

repartition(numPartitions)

Changes the level of parallelism in this DStream by creating more or fewer partitions. By default, Spark set the number of partitions to number of CPU cores (threads) in the Spark machine for parallelism. Each partition can be processed by one CPU core (thread)

Example:

lines.getClass
//res15: Class[_ <: org.apache.spark.streaming.dstream.DStream[String]] = class org.apache.spark.streaming.dstream.MappedDStream
lines.repartition(3)
//res16: org.apache.spark.streaming.dstream.DStream[String] = org.apache.spark.streaming.dstream.TransformedDStream@73fcb174

Last updated