Working with key value pair
While most Spark operations work on RDDs containing any type of objects, a few special operations are only available on RDDs of key-value pairs. The most common ones are such as grouping or aggregating the elements by a key.
In Scala, these operations are automatically available on RDDs containing Tuple2 objects (the built-in tuples in the language, created by simply writing (a, b)). The key-value pair operations are available in the PairRDDFunctions class, which automatically wraps around an RDD of tuples.
For example, the following code uses the reduceByKey operation on key-value pairs to count how many times each line of text occurs in a file
Last updated