PartitionStrategy Trait

trait PartitionStrategy extends Serializable

Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.

abstract def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID

Returns the partition number for a given edge.

Example:

import org.apache.spark.graphx.{GraphLoader, PartitionStrategy}

// Load the edges in canonical order and partition the graph for triangle count
val graph = GraphLoader.edgeListFile(sc, "file:///home/dv6/spark/spark/data/graphx/followers.txt", true)
  .partitionBy(PartitionStrategy.RandomVertexCut)
// Find the triangle count for each vertex
val triCounts = graph.triangleCount().vertices
// Join the triangle counts with the usernames
val users = sc.textFile("file:///home/dv6/spark/spark/data/graphx/users.txt").map { line =>
  val fields = line.split(",")
  (fields(0).toLong, fields(1))
}
val triCountByUsername = users.join(triCounts).map { case (id, (username, tc)) =>
  (username, tc)
}
// Print the result
println(triCountByUsername.collect().mkString("\n"))

/*
Output:
(justinbieber,0)
(matei_zaharia,1)
(ladygaga,0)
(BarackObama,0)
(jeresig,1)
(odersky,1)

*/

Reference:

https://github.com/apache/spark/blob/v2.4.5/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala

Last updated