repartition(numPartitions)

Changes the level of parallelism in this DStream by creating more or fewer partitions. By default, Spark set the number of partitions to number of CPU cores (threads) in the Spark machine for parallelism. Each partition can be processed by one CPU core (thread)

Example:

lines.getClass
//res15: Class[_ <: org.apache.spark.streaming.dstream.DStream[String]] = class org.apache.spark.streaming.dstream.MappedDStream
lines.repartition(3)
//res16: org.apache.spark.streaming.dstream.DStream[String] = org.apache.spark.streaming.dstream.TransformedDStream@73fcb174

PreviousScala Tips for updateStateByKey NextDStream Window Operations

Last updated 5 years ago

Was this helpful?