# Package org.apache.spark.graphx

![](/files/-M3A94eq5qr6SyyMtJPZ)

GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

The property graph is a directed multigraph with user defined objects attached to each vertex and edge. A directed multigraph is a directed graph with potentially multiple parallel edges sharing the same source and destination vertex. The ability to support parallel edges simplifies modeling scenarios where there can be multiple relationships (e.g., co-worker and friend) between the same vertices. Each vertex is keyed by a unique 64-bit long identifier (VertexId). GraphX does not impose any ordering constraints on the vertex identifiers. Similarly, edges have corresponding source and destination vertex identifiers.

The property graph is parameterized over the vertex (VD) and edge (ED) types. These are the types of the objects associated with each vertex and edge respectively.

GraphX optimizes the representation of vertex and edge types when they are primitive data types (e.g., int, double, etc…) reducing the in memory footprint by storing them in specialized arrays.

Important library:

### Classes included in org.apache.spark.graphx:

[Edge](/data-science-and-apache-spark/edge-class.md)

[EdgeContext](/data-science-and-apache-spark/edgecontext-class.md)

[EdgeDirection](/data-science-and-apache-spark/edgedirection-class.md)

[EdgeRDD](/data-science-and-apache-spark/edgerdd-class.md)

[EdgeTriplet](/data-science-and-apache-spark/edgetriplet-class.md)

[Graph](/data-science-and-apache-spark/graph-class.md)

[GraphLoader](/data-science-and-apache-spark/graphloader-object.md)

[GraphOps](/data-science-and-apache-spark/graphops-class.md)

[GraphXUtils](/data-science-and-apache-spark/graphxutils-object.md)

[PartitionStrategy](/data-science-and-apache-spark/partitionstrategy-trait.md)

[Pregel](/data-science-and-apache-spark/pregel-object.md)

[TripletFields](/data-science-and-apache-spark/tripletfields-class.md)

[VertexRDD](/data-science-and-apache-spark/vertexrdd-class.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://george-jen.gitbook.io/data-science-and-apache-spark/graphx.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
