Graphx Example 2

pageRank example

Given user data in users.txt and follower data in followers.txt, with the column name is as illustrated, calculate pageRank with tolerence at 0.0001 for each user, sort the output by rank.

users.txt (user id, username, name)

1
1,BarackObama,Barack Obama
2
2,ladygaga,Goddess of Love
3
3,jeresig,John Resig
4
4,justinbieber,Justin Bieber
5
6,matei_zaharia,Matei Zaharia
6
7,odersky,Martin Odersky
7
8,anonsys
Copied!

followers.txt: (follower user id, to be followed user id)

1
2 1
2
4 1
3
1 2
4
6 3
5
7 3
6
7 6
7
6 7
8
3 7
Copied!
Code:
1
import org.apache.spark._
2
import org.apache.spark.graphx._
3
import org.apache.spark.rdd.RDD
4
import org.apache.spark.graphx.GraphLoader
5
​
6
// Load the edges as a graph
7
val graph = GraphLoader.edgeListFile(sc, "file:///home/dv6/spark/spark/data/graphx/followers.txt")
8
// Run PageRank, with 0.0001 as tolerence
9
​
10
val ranks = graph.pageRank(0.0001).vertices
11
​
12
ranks.foreach(println)
13
/*
14
user id, ranking
15
(4,0.15007622780470478)
16
(6,0.7017164142469724)
17
(2,1.3907556008752426)
18
(1,1.4596227918476916)
19
(3,0.9998520559494657)
20
(7,1.2979769092759237)
21
*/
22
​
23
// Join the ranks with the usernames
24
val users = sc.textFile("file:///home/dv6/spark/spark/data/graphx/users.txt").map { line =>
25
val fields = line.split(",")
26
(fields(0).toLong, fields(1))
27
}
28
val ranksByUsername = users.join(ranks).map {
29
case (id, (username, rank)) => (username, rank)
30
}
31
​
32
ranksByUsername.foreach(println)
33
​
34
/*
35
Output:
36
(BarackObama,1.4596227918476916)
37
(jeresig,0.9998520559494657)
38
(odersky,1.2979769092759237)
39
(justinbieber,0.15007622780470478)
40
(matei_zaharia,0.7017164142469724)
41
(ladygaga,1.3907556008752426)
42
​
43
*/
44
​
45
//Sortby ranking
46
​
47
ranksByUsername.toDF.withColumnRenamed("_1","username")
48
.withColumnRenamed("_2","rank")
49
.createOrReplaceTempView("ranksByusername")
50
spark.sql("select * from ranksByusername order by rank desc").show()
51
/*
52
Output:
53
+-------------+-------------------+
54
| username| rank|
55
+-------------+-------------------+
56
| BarackObama| 1.4596227918476916|
57
| ladygaga| 1.3907556008752426|
58
| odersky| 1.2979769092759237|
59
| jeresig| 0.9998520559494657|
60
|matei_zaharia| 0.7017164142469724|
61
| justinbieber|0.15007622780470478|
62
+-------------+-------------------+
63
​
64
​
65
​
66
*/
Copied!
Last modified 1yr ago