PartitionStrategy Trait
trait PartitionStrategy extends Serializable
Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.
1
abstract def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID
Copied!
Returns the partition number for a given edge.
Example:
1
import org.apache.spark.graphx.{GraphLoader, PartitionStrategy}
2
​
3
// Load the edges in canonical order and partition the graph for triangle count
4
val graph = GraphLoader.edgeListFile(sc, "file:///home/dv6/spark/spark/data/graphx/followers.txt", true)
5
.partitionBy(PartitionStrategy.RandomVertexCut)
6
// Find the triangle count for each vertex
7
val triCounts = graph.triangleCount().vertices
8
// Join the triangle counts with the usernames
9
val users = sc.textFile("file:///home/dv6/spark/spark/data/graphx/users.txt").map { line =>
10
val fields = line.split(",")
11
(fields(0).toLong, fields(1))
12
}
13
val triCountByUsername = users.join(triCounts).map { case (id, (username, tc)) =>
14
(username, tc)
15
}
16
// Print the result
17
println(triCountByUsername.collect().mkString("\n"))
18
​
19
/*
20
Output:
21
(justinbieber,0)
22
(matei_zaharia,1)
23
(ladygaga,0)
24
(BarackObama,0)
25
(jeresig,1)
26
(odersky,1)
27
​
28
*/
Copied!
Reference:
Last modified 1yr ago
Copy link