# RDD Operations

### RDDs support two types of operations:

#### Transformations -- create a new RDD from an existing one.

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program.

This design enables Spark to run more efficiently. For example, we can realize that a RDD created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped RDD.

#### Actions -- return a value to the driver program after running a computation on the RDD.

For example, map is a transformation that passes each RDD element through a function and returns a new RDD representing the results. On the other hand, reduce is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program (although there is also a parallel reduceByKey that returns a distributed dataset

```
val rddFromFile = sc.textFile("file:///home/dv6/spark/spark/data/graphx/followers.txt")
val lineLengths = rddFromFile.map(s => s.length)
//if you want to save transformation lineLengths in memory
lineLengths.persist()
//without persist(), lineLengths transformtion will be recompiled
val totalLength = lineLengths.reduce((a, b) => a + b)

```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://george-jen.gitbook.io/data-science-and-apache-spark/rdd-operations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
