map(func)

map(func)

Return a new DStream by passing each element of the source DStream through a function func.

    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

flatMap(func)

Similar to map, but each input item can be mapped to 0 or more output items.

//split words from line separated by space, such as sentence
    val lines = ssc.socketTextStream("localhost", 29999
      , StorageLevel.MEMORY_AND_DISK_SER)
    val words = lines.flatMap(_.split(" "))

There are choises of StorageLevel:

DISK_ONLY = StorageLevel(True, False, False, False, 1)
DISK_ONLY_2 = StorageLevel(True, False, False, False, 2)
MEMORY_AND_DISK = StorageLevel(True, True, False, False, 1)
MEMORY_AND_DISK_2 = StorageLevel(True, True, False, False, 2)
MEMORY_AND_DISK_SER = StorageLevel(True, True, False, False, 1)
MEMORY_AND_DISK_SER_2 = StorageLevel(True, True, False, False, 2)
MEMORY_ONLY = StorageLevel(False, True, False, False, 1)
MEMORY_ONLY_2 = StorageLevel(False, True, False, False, 2)
MEMORY_ONLY_SER = StorageLevel(False, True, False, False, 1)
MEMORY_ONLY_SER_2 = StorageLevel(False, True, False, False, 2)
OFF_HEAP = StorageLevel(True, True, True, False, 1)
(2 means RDD partitions will have replication of 2)

Last updated