Passing Function to Spark
Spark's API relies heavily on passing functions in the driver program to run on the cluster. There are two recommended ways to do this:
Anonymous function syntax, which can be used for short pieces of code (In Python, it is called lambda function)
1
val rdd=sc.parallelize(Seq(1,2,3,4,5))
2
rdd.map(x=>x+1)
Copied!
Static methods. For example, you can define function called myFunc, as follows:
1
def myFunc(x: Int): Int ={
2
x+1
3
}
4
val rdd=sc.parallelize(Seq(1,2,3,4,5))
5
rdd.map(myFunc)
Copied!
Last modified 1yr ago
Copy link