Passing Function to Spark

Spark's API relies heavily on passing functions in the driver program to run on the cluster. There are two recommended ways to do this:

Anonymous function syntax, which can be used for short pieces of code (In Python, it is called lambda function)

val rdd=sc.parallelize(Seq(1,2,3,4,5))
rdd.map(x=>x+1)

Static methods. For example, you can define function called myFunc, as follows:

def myFunc(x: Int): Int ={
    x+1
}
val rdd=sc.parallelize(Seq(1,2,3,4,5))
rdd.map(myFunc)

Last updated