package graphx
ALPHA COMPONENT GraphX is a graph processing framework built on top of Spark.
- Source
- package.scala
- Alphabetic
- By Inheritance
- graphx
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
Edge[ED](srcId: VertexId = 0, dstId: VertexId = 0, attr: ED = null.asInstanceOf[ED]) extends Serializable with Product
A single directed edge consisting of a source id, target id, and the data associated with the edge.
A single directed edge consisting of a source id, target id, and the data associated with the edge.
- ED
type of the edge attribute
- srcId
The vertex id of the source vertex
- dstId
The vertex id of the target vertex
- attr
The attribute associated with the edge
-
abstract
class
EdgeContext[VD, ED, A] extends AnyRef
Represents an edge along with its neighboring vertices and allows sending messages along the edge.
Represents an edge along with its neighboring vertices and allows sending messages along the edge. Used in Graph#aggregateMessages.
-
class
EdgeDirection extends Serializable
The direction of a directed edge relative to a vertex.
-
abstract
class
EdgeRDD[ED] extends RDD[Edge[ED]]
EdgeRDD[ED, VD]
extendsRDD[Edge[ED]]
by storing the edges in columnar format on each partition for performance.EdgeRDD[ED, VD]
extendsRDD[Edge[ED]]
by storing the edges in columnar format on each partition for performance. It may additionally store the vertex attributes associated with each edge to provide the triplet view. Shipping of the vertex attributes is managed byimpl.ReplicatedVertexView
. -
class
EdgeTriplet[VD, ED] extends Edge[ED]
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
- VD
the type of the vertex attribute.
- ED
the type of the edge attribute
-
abstract
class
Graph[VD, ED] extends Serializable
The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.
The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges. The graph provides basic operations to access and manipulate the data associated with vertices and edges as well as the underlying structure. Like Spark RDDs, the graph is a functional data-structure in which mutating operations return new graphs.
- VD
the vertex attribute type
- ED
the edge attribute type
- Note
GraphOps contains additional convenience operations and graph algorithms.
-
class
GraphOps[VD, ED] extends Serializable
Contains additional functionality for Graph.
Contains additional functionality for Graph. All operations are expressed in terms of the efficient GraphX API. This class is implicitly constructed for each Graph object.
- VD
the vertex attribute type
- ED
the edge attribute type
-
type
PartitionID = Int
Integer identifier of a graph partition.
Integer identifier of a graph partition. Must be less than 2^30.
-
trait
PartitionStrategy extends Serializable
Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.
- class TripletFields extends Serializable
-
type
VertexId = Long
A 64-bit vertex identifier that uniquely identifies a vertex within a graph.
A 64-bit vertex identifier that uniquely identifies a vertex within a graph. It does not need to follow any ordering or any constraints other than uniqueness.
-
abstract
class
VertexRDD[VD] extends RDD[(VertexId, VD)]
Extends
RDD[(VertexId, VD)]
by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins.Extends
RDD[(VertexId, VD)]
by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be joined efficiently. All operations except reindex preserve the index. To construct aVertexRDD
, use the VertexRDD object.Additionally, stores routing information to enable joining the vertex attributes with an EdgeRDD.
- VD
the vertex attribute associated with each vertex in the set.
Construct a
VertexRDD
from a plain RDD:// Construct an initial vertex set val someData: RDD[(VertexId, SomeType)] = loadData(someFile) val vset = VertexRDD(someData) // If there were redundant values in someData we would use a reduceFunc val vset2 = VertexRDD(someData, reduceFunc) // Finally we can use the VertexRDD to index another dataset val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile) val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b } // Now we can construct very fast joins between the two sets val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)
Example:
Value Members
- object Edge extends Serializable
- object EdgeContext
-
object
EdgeDirection extends Serializable
A set of EdgeDirections.
- object EdgeRDD extends Serializable
-
object
Graph extends Serializable
The Graph object contains a collection of routines used to construct graphs from RDDs.
-
object
GraphLoader extends Logging
Provides utilities for loading Graphs from files.
- object GraphXUtils
-
object
PartitionStrategy extends Serializable
Collection of built-in PartitionStrategy implementations.
-
object
Pregel extends Logging
Implements a Pregel-like bulk-synchronous message-passing API.
Implements a Pregel-like bulk-synchronous message-passing API.
Unlike the original Pregel API, the GraphX Pregel API factors the sendMessage computation over edges, enables the message sending computation to read both vertex attributes, and constrains messages to the graph structure. These changes allow for substantially more efficient distributed execution while also exposing greater flexibility for graph-based computation.
We can use the Pregel abstraction to implement PageRank:
val pagerankGraph: Graph[Double, Double] = graph // Associate the degree with each vertex .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) } // Set the weight on the edges based on the degree .mapTriplets(e => 1.0 / e.srcAttr) // Set the vertex attributes to the initial pagerank values .mapVertices((id, attr) => 1.0) def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double = resetProb + (1.0 - resetProb) * msgSum def sendMessage(id: VertexId, edge: EdgeTriplet[Double, Double]): Iterator[(VertexId, Double)] = Iterator((edge.dstId, edge.srcAttr * edge.attr)) def messageCombiner(a: Double, b: Double): Double = a + b val initialMessage = 0.0 // Execute Pregel for a fixed number of iterations. Pregel(pagerankGraph, initialMessage, numIter)( vertexProgram, sendMessage, messageCombiner)
Example: -
object
VertexRDD extends Serializable
The VertexRDD singleton is used to construct VertexRDDs.