Class PageRank

Object org.apache.spark.graphx.lib.PageRank

PageRank algorithm implementation. There are two implementations of PageRank implemented.

The first implementation uses the standalone Graph interface and runs PageRank for a fixed number of iterations:

 var PR = Array.fill(n)( 1.0 )
 val oldPR = Array.fill(n)( 1.0 )
 for( iter <- 0 until numIter ) {
   swap(oldPR, PR)
   for( i <- 0 until n ) {
     PR[i] = alpha + (1 - alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
   }
 }

The second implementation uses the Pregel interface and runs PageRank until convergence:

 var PR = Array.fill(n)( 1.0 )
 val oldPR = Array.fill(n)( 0.0 )
 while( max(abs(PR - oldPr)) > tol ) {
   swap(oldPR, PR)
   for( i <- 0 until n if abs(PR[i] - oldPR[i]) > tol ) {
     PR[i] = alpha + (1 - \alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
   }
 }

alpha is the random reset probability (typically 0.15), inNbrs[i] is the set of neighbors which link to i and outDeg[j] is the out degree of vertex j.

Note: This is not the "normalized" PageRank and as a consequence pages that have no inlinks will have a PageRank of alpha.

Methods:

Last updated

Was this helpful?