PartitionStrategy Trait
trait PartitionStrategy extends Serializable
Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.
abstract def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID
Returns the partition number for a given edge.
Example:
import org.apache.spark.graphx.{GraphLoader, PartitionStrategy}
// Load the edges in canonical order and partition the graph for triangle count
val graph = GraphLoader.edgeListFile(sc, "file:///home/dv6/spark/spark/data/graphx/followers.txt", true)
.partitionBy(PartitionStrategy.RandomVertexCut)
// Find the triangle count for each vertex
val triCounts = graph.triangleCount().vertices
// Join the triangle counts with the usernames
val users = sc.textFile("file:///home/dv6/spark/spark/data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val triCountByUsername = users.join(triCounts).map { case (id, (username, tc)) =>
(username, tc)
}
// Print the result
println(triCountByUsername.collect().mkString("\n"))
/*
Output:
(justinbieber,0)
(matei_zaharia,1)
(ladygaga,0)
(BarackObama,0)
(jeresig,1)
(odersky,1)
*/
Reference:
Last modified 3yr ago