foreachRDD(func)

foreachRDD(func)

The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

Example:

    val focus_tweets_collection=tweets_collection.filter(text=>text.toLowerCase.contains("happy") | text.toLowerCase.contains("money"))
      
    //iterate through DSTREAM for each RDD, show it and store tweets streaming 
    // to HIVE table, this is an example of as part of ETL to harvest and store
    // streaming data from Twitter
    focus_tweets_collection.foreachRDD{ rdd =>
         if(!rdd.isEmpty) {
             val tweetsDataFrame = rdd.toDF("newTweet")
             tweetsDataFrame.show(false)
//             tweetsDataFrame.printSchema()
//             println(tweetsDataFrame.count())
//Create in memory temp view newTweets
             tweetsDataFrame.createOrReplaceTempView("newTweets")
// Inserts the new tweets received to HIVE table tweets, with 
// current datetimestamp
             spark.sql("insert into tweets select from_unixtime(unix_timestamp()), * from newTweets")

        }
      }

Last updated