# Packages

• package
Definition Classes
root
• package
Definition Classes
root
• package
Definition Classes
org
• package

Core Spark functionality.

Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

Definition Classes
apache
• package

ALPHA COMPONENT GraphX is a graph processing framework built on top of Spark.

ALPHA COMPONENT GraphX is a graph processing framework built on top of Spark.

Definition Classes
spark
• package
Definition Classes
graphx
• package lib

Various analytics functions for graphs.

Various analytics functions for graphs.

Definition Classes
graphx
• ConnectedComponents
• LabelPropagation
• PageRank
• SVDPlusPlus
• ShortestPaths
• StronglyConnectedComponents
• TriangleCount
• package

Collections of utilities used by graphx.

Collections of utilities used by graphx.

Definition Classes
graphx
p

# lib 

#### package lib

Various analytics functions for graphs.

Source
package.scala
Linear Supertypes
AnyRef, Any
Ordering
1. Alphabetic
2. By Inheritance
Inherited
1. lib
2. AnyRef
3. Any
1. Hide All
2. Show All
Visibility
1. Public
2. All

### Value Members

1. object

Connected components algorithm.

2. object

Label Propagation algorithm.

3. object PageRank extends Logging

PageRank algorithm implementation.

PageRank algorithm implementation. There are two implementations of PageRank implemented.

The first implementation uses the standalone Graph interface and runs PageRank for a fixed number of iterations:

var PR = Array.fill(n)( 1.0 )
val oldPR = Array.fill(n)( 1.0 )
for( iter <- 0 until numIter ) {
swap(oldPR, PR)
for( i <- 0 until n ) {
PR[i] = alpha + (1 - alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
}
}

The second implementation uses the Pregel interface and runs PageRank until convergence:

var PR = Array.fill(n)( 1.0 )
val oldPR = Array.fill(n)( 0.0 )
while( max(abs(PR - oldPr)) > tol ) {
swap(oldPR, PR)
for( i <- 0 until n if abs(PR[i] - oldPR[i]) > tol ) {
PR[i] = alpha + (1 - \alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
}
}

alpha is the random reset probability (typically 0.15), inNbrs[i] is the set of neighbors which link to i and outDeg[j] is the out degree of vertex j.

Note

This is not the "normalized" PageRank and as a consequence pages that have no inlinks will have a PageRank of alpha.

4. object

Implementation of SVD++ algorithm.

5. object ShortestPaths extends Serializable

Computes shortest paths to the given set of landmark vertices, returning a graph where each vertex attribute is a map containing the shortest-path distance to each reachable landmark.

6. object

Strongly connected components algorithm implementation.

7. object

Compute the number of triangles passing through each vertex.

Compute the number of triangles passing through each vertex.

The algorithm is relatively straightforward and can be computed in three steps:

• Compute the set of neighbors for each vertex
• For each edge compute the intersection of the sets and send the count to both vertices.
• Compute the sum at each vertex and divide by two since each triangle is counted twice.

There are two implementations. The default TriangleCount.run implementation first removes self cycles and canonicalizes the graph to ensure that the following conditions hold:

• There are no self edges
• All edges are oriented (src is greater than dst)
• There are no duplicate edges

However, the canonicalization procedure is costly as it requires repartitioning the graph. If the input data is already in "canonical form" with self cycles removed then the TriangleCount.runPreCanonicalized should be used instead.

val canonicalGraph = graph.mapEdges(e => 1).removeSelfEdges().canonicalizeEdges()
val counts = TriangleCount.runPreCanonicalized(canonicalGraph).vertices