Package org.apache.spark.graphx.impl
Class GraphImpl<VD,ED>
Object
org.apache.spark.graphx.Graph<VD,ED>
org.apache.spark.graphx.impl.GraphImpl<VD,ED>
- All Implemented Interfaces:
Serializable
An implementation of
Graph
to support computation on graphs.
Graphs are represented using two RDDs: vertices
, which contains vertex attributes and the
routing information for shipping vertex attributes to edge partitions, and
replicatedVertexView
, which contains edges and the vertex attributes mentioned by each edge.
- See Also:
-
Method Summary
Modifier and TypeMethodDescription<A> VertexRDD<A>
aggregateMessagesWithActiveSet
(scala.Function1<EdgeContext<VD, ED, A>, scala.runtime.BoxedUnit> sendMsg, scala.Function2<A, A, A> mergeMsg, TripletFields tripletFields, scala.Option<scala.Tuple2<VertexRDD<?>, EdgeDirection>> activeSetOpt, scala.reflect.ClassTag<A> evidence$10) static <VD,
ED> GraphImpl<VD, ED> apply
(VertexRDD<VD> vertices, EdgeRDD<ED> edges, scala.reflect.ClassTag<VD> evidence$19, scala.reflect.ClassTag<ED> evidence$20) Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices.static <VD,
ED> GraphImpl<VD, ED> apply
(RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$13, scala.reflect.ClassTag<ED> evidence$14) Create a graph from edges, setting referenced vertices todefaultVertexAttr
.static <VD,
ED> GraphImpl<VD, ED> apply
(RDD<scala.Tuple2<Object, VD>> vertices, RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$17, scala.reflect.ClassTag<ED> evidence$18) Create a graph from vertices and edges, setting missing vertices todefaultVertexAttr
.cache()
Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default toMEMORY_ONLY
.void
Mark this Graph for checkpointing.edges()
An RDD containing the edges and their associated attributes.static <VD,
ED> GraphImpl<VD, ED> fromEdgePartitions
(RDD<scala.Tuple2<Object, org.apache.spark.graphx.impl.EdgePartition<ED, VD>>> edgePartitions, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$15, scala.reflect.ClassTag<ED> evidence$16) Create a graph from EdgePartitions, setting referenced vertices todefaultVertexAttr
.static <VD,
ED> GraphImpl<VD, ED> fromExistingRDDs
(VertexRDD<VD> vertices, EdgeRDD<ED> edges, scala.reflect.ClassTag<VD> evidence$21, scala.reflect.ClassTag<ED> evidence$22) Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the vertices.scala.collection.immutable.Seq<String>
Gets the name of the files to which this Graph was checkpointed.groupEdges
(scala.Function2<ED, ED, ED> merge) Merges multiple edges between two vertices into a single edge.boolean
Return whether this Graph has been checkpointed or not.mapEdges
(scala.Function2<Object, scala.collection.Iterator<Edge<ED>>, scala.collection.Iterator<ED2>> f, scala.reflect.ClassTag<ED2> evidence$6) Transforms each edge attribute using the map function, passing it a whole partition at a time.mapTriplets
(scala.Function2<Object, scala.collection.Iterator<EdgeTriplet<VD, ED>>, scala.collection.Iterator<ED2>> f, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$7) Transforms each edge attribute a partition at a time using the map function, passing it the adjacent vertex attributes as well.mapVertices
(scala.Function2<Object, VD, VD2> f, scala.reflect.ClassTag<VD2> evidence$5, scala.$eq$colon$eq<VD, VD2> eq) Transforms each vertex attribute in the graph using the map function.mask
(Graph<VD2, ED2> other, scala.reflect.ClassTag<VD2> evidence$8, scala.reflect.ClassTag<ED2> evidence$9) Restricts the graph to only the vertices and edges that are also inother
, but keeps the attributes from this graph.outerJoinVertices
(RDD<scala.Tuple2<Object, U>> other, scala.Function3<Object, VD, scala.Option<U>, VD2> updateF, scala.reflect.ClassTag<U> evidence$11, scala.reflect.ClassTag<VD2> evidence$12, scala.$eq$colon$eq<VD, VD2> eq) Joins the vertices with entries in thetable
RDD and merges the results usingmapFunc
.partitionBy
(PartitionStrategy partitionStrategy) Repartitions the edges in the graph according topartitionStrategy
.partitionBy
(PartitionStrategy partitionStrategy, int numPartitions) Repartitions the edges in the graph according topartitionStrategy
.persist
(StorageLevel newLevel) Caches the vertices and edges associated with this graph at the specified storage level, ignoring any target storage levels previously set.reverse()
Reverses all edges in the graph.Restricts the graph to only the vertices and edges satisfying the predicates.RDD<EdgeTriplet<VD,
ED>> triplets()
An RDD containing the edge triplets, which are edges along with the vertex data associated with the adjacent vertices.unpersist
(boolean blocking) Uncaches both vertices and edges of this graph.unpersistVertices
(boolean blocking) Uncaches only the vertices of this graph, leaving the edges alone.vertices()
An RDD containing the vertices and their associated attributes.Methods inherited from class org.apache.spark.graphx.Graph
aggregateMessages, fromEdges, fromEdgeTuples, graphToGraphOps, mapEdges, mapTriplets, mapTriplets, ops
-
Method Details
-
apply
public static <VD,ED> GraphImpl<VD,ED> apply(RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$13, scala.reflect.ClassTag<ED> evidence$14) Create a graph from edges, setting referenced vertices todefaultVertexAttr
.- Parameters:
edges
- (undocumented)defaultVertexAttr
- (undocumented)edgeStorageLevel
- (undocumented)vertexStorageLevel
- (undocumented)evidence$13
- (undocumented)evidence$14
- (undocumented)- Returns:
- (undocumented)
-
fromEdgePartitions
public static <VD,ED> GraphImpl<VD,ED> fromEdgePartitions(RDD<scala.Tuple2<Object, org.apache.spark.graphx.impl.EdgePartition<ED, VD>>> edgePartitions, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$15, scala.reflect.ClassTag<ED> evidence$16) Create a graph from EdgePartitions, setting referenced vertices todefaultVertexAttr
.- Parameters:
edgePartitions
- (undocumented)defaultVertexAttr
- (undocumented)edgeStorageLevel
- (undocumented)vertexStorageLevel
- (undocumented)evidence$15
- (undocumented)evidence$16
- (undocumented)- Returns:
- (undocumented)
-
apply
public static <VD,ED> GraphImpl<VD,ED> apply(RDD<scala.Tuple2<Object, VD>> vertices, RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$17, scala.reflect.ClassTag<ED> evidence$18) Create a graph from vertices and edges, setting missing vertices todefaultVertexAttr
.- Parameters:
vertices
- (undocumented)edges
- (undocumented)defaultVertexAttr
- (undocumented)edgeStorageLevel
- (undocumented)vertexStorageLevel
- (undocumented)evidence$17
- (undocumented)evidence$18
- (undocumented)- Returns:
- (undocumented)
-
apply
public static <VD,ED> GraphImpl<VD,ED> apply(VertexRDD<VD> vertices, EdgeRDD<ED> edges, scala.reflect.ClassTag<VD> evidence$19, scala.reflect.ClassTag<ED> evidence$20) Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices. The VertexRDD must already be set up for efficient joins with the EdgeRDD by callingVertexRDD.withEdges
or an appropriate VertexRDD constructor.- Parameters:
vertices
- (undocumented)edges
- (undocumented)evidence$19
- (undocumented)evidence$20
- (undocumented)- Returns:
- (undocumented)
-
fromExistingRDDs
public static <VD,ED> GraphImpl<VD,ED> fromExistingRDDs(VertexRDD<VD> vertices, EdgeRDD<ED> edges, scala.reflect.ClassTag<VD> evidence$21, scala.reflect.ClassTag<ED> evidence$22) Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the vertices. The VertexRDD must already be set up for efficient joins with the EdgeRDD by callingVertexRDD.withEdges
or an appropriate VertexRDD constructor.- Parameters:
vertices
- (undocumented)edges
- (undocumented)evidence$21
- (undocumented)evidence$22
- (undocumented)- Returns:
- (undocumented)
-
vertices
Description copied from class:Graph
An RDD containing the vertices and their associated attributes. -
replicatedVertexView
-
edges
Description copied from class:Graph
An RDD containing the edges and their associated attributes. The entries in the RDD contain just the source id and target id along with the edge data. -
triplets
Description copied from class:Graph
An RDD containing the edge triplets, which are edges along with the vertex data associated with the adjacent vertices. The caller should useGraph.edges()
if the vertex data are not needed, i.e. if only the edge data and adjacent vertex ids are needed. -
persist
Description copied from class:Graph
Caches the vertices and edges associated with this graph at the specified storage level, ignoring any target storage levels previously set. -
cache
Description copied from class:Graph
Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default toMEMORY_ONLY
. This is used to pin a graph in memory enabling multiple queries to reuse the same construction process. -
checkpoint
public void checkpoint()Description copied from class:Graph
Mark this Graph for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. It is strongly recommended that this Graph is persisted in memory, otherwise saving it on a file will require recomputation.- Specified by:
checkpoint
in classGraph<VD,
ED>
-
isCheckpointed
public boolean isCheckpointed()Description copied from class:Graph
Return whether this Graph has been checkpointed or not. This returns true iff both the vertices RDD and edges RDD have been checkpointed.- Specified by:
isCheckpointed
in classGraph<VD,
ED> - Returns:
- (undocumented)
-
getCheckpointFiles
Description copied from class:Graph
Gets the name of the files to which this Graph was checkpointed. (The vertices RDD and edges RDD are checkpointed separately.)- Specified by:
getCheckpointFiles
in classGraph<VD,
ED> - Returns:
- (undocumented)
-
unpersist
Description copied from class:Graph
Uncaches both vertices and edges of this graph. This is useful in iterative algorithms that build a new graph in each iteration. -
unpersistVertices
Description copied from class:Graph
Uncaches only the vertices of this graph, leaving the edges alone. This is useful in iterative algorithms that modify the vertex attributes but reuse the edges. This method can be used to uncache the vertex attributes of previous iterations once they are no longer needed, improving GC performance.- Specified by:
unpersistVertices
in classGraph<VD,
ED> - Parameters:
blocking
- Whether to block until all data is unpersisted (default: false)- Returns:
- (undocumented)
-
partitionBy
Description copied from class:Graph
Repartitions the edges in the graph according topartitionStrategy
.- Specified by:
partitionBy
in classGraph<VD,
ED> - Parameters:
partitionStrategy
- the partitioning strategy to use when partitioning the edges in the graph.- Returns:
- (undocumented)
-
partitionBy
Description copied from class:Graph
Repartitions the edges in the graph according topartitionStrategy
.- Specified by:
partitionBy
in classGraph<VD,
ED> - Parameters:
partitionStrategy
- the partitioning strategy to use when partitioning the edges in the graph.numPartitions
- the number of edge partitions in the new graph.- Returns:
- (undocumented)
-
reverse
Description copied from class:Graph
Reverses all edges in the graph. If this graph contains an edge from a to b then the returned graph contains an edge from b to a. -
mapVertices
public <VD2> Graph<VD2,ED> mapVertices(scala.Function2<Object, VD, VD2> f, scala.reflect.ClassTag<VD2> evidence$5, scala.$eq$colon$eq<VD, VD2> eq) Description copied from class:Graph
Transforms each vertex attribute in the graph using the map function.- Specified by:
mapVertices
in classGraph<VD,
ED> - Parameters:
f
- the function from a vertex object to a new vertex valueevidence$5
- (undocumented)eq
- (undocumented)- Returns:
- (undocumented)
-
mapEdges
public <ED2> Graph<VD,ED2> mapEdges(scala.Function2<Object, scala.collection.Iterator<Edge<ED>>, scala.collection.Iterator<ED2>> f, scala.reflect.ClassTag<ED2> evidence$6) Description copied from class:Graph
Transforms each edge attribute using the map function, passing it a whole partition at a time. The map function is given an iterator over edges within a logical partition as well as the partition's ID, and it should return a new iterator over the new values of each edge. The new iterator's elements must correspond one-to-one with the old iterator's elements. If adjacent vertex values are desired, usemapTriplets
. -
mapTriplets
public <ED2> Graph<VD,ED2> mapTriplets(scala.Function2<Object, scala.collection.Iterator<EdgeTriplet<VD, ED>>, scala.collection.Iterator<ED2>> f, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$7) Description copied from class:Graph
Transforms each edge attribute a partition at a time using the map function, passing it the adjacent vertex attributes as well. The map function is given an iterator over edge triplets within a logical partition and should yield a new iterator over the new values of each edge in the order in which they are provided. If adjacent vertex values are not required, consider usingmapEdges
instead.- Specified by:
mapTriplets
in classGraph<VD,
ED> - Parameters:
f
- the iterator transformtripletFields
- which fields should be included in the edge triplet passed to the map function. If not all fields are needed, specifying this can improve performance.evidence$7
- (undocumented)- Returns:
- (undocumented)
-
subgraph
public Graph<VD,ED> subgraph(scala.Function1<EdgeTriplet<VD, ED>, Object> epred, scala.Function2<Object, VD, Object> vpred) Description copied from class:Graph
Restricts the graph to only the vertices and edges satisfying the predicates. The resulting subgraph satisfiesV' = {v : for all v in V where vpred(v)} E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && vpred(v)}
- Specified by:
subgraph
in classGraph<VD,
ED> - Parameters:
epred
- the edge predicate, which takes a triplet and evaluates to true if the edge is to remain in the subgraph. Note that only edges where both vertices satisfy the vertex predicate are considered.vpred
- the vertex predicate, which takes a vertex object and evaluates to true if the vertex is to be included in the subgraph- Returns:
- the subgraph containing only the vertices and edges that satisfy the predicates
-
mask
public <VD2,ED2> Graph<VD,ED> mask(Graph<VD2, ED2> other, scala.reflect.ClassTag<VD2> evidence$8, scala.reflect.ClassTag<ED2> evidence$9) Description copied from class:Graph
Restricts the graph to only the vertices and edges that are also inother
, but keeps the attributes from this graph. -
groupEdges
Description copied from class:Graph
Merges multiple edges between two vertices into a single edge. For correct results, the graph must have been partitioned usingpartitionBy
.- Specified by:
groupEdges
in classGraph<VD,
ED> - Parameters:
merge
- the user-supplied commutative associative function to merge edge attributes for duplicate edges.- Returns:
- The resulting graph with a single edge for each (source, dest) vertex pair.
-
aggregateMessagesWithActiveSet
public <A> VertexRDD<A> aggregateMessagesWithActiveSet(scala.Function1<EdgeContext<VD, ED, A>, scala.runtime.BoxedUnit> sendMsg, scala.Function2<A, A, A> mergeMsg, TripletFields tripletFields, scala.Option<scala.Tuple2<VertexRDD<?>, EdgeDirection>> activeSetOpt, scala.reflect.ClassTag<A> evidence$10) -
outerJoinVertices
public <U,VD2> Graph<VD2,ED> outerJoinVertices(RDD<scala.Tuple2<Object, U>> other, scala.Function3<Object, VD, scala.Option<U>, VD2> updateF, scala.reflect.ClassTag<U> evidence$11, scala.reflect.ClassTag<VD2> evidence$12, scala.$eq$colon$eq<VD, VD2> eq) Description copied from class:Graph
Joins the vertices with entries in thetable
RDD and merges the results usingmapFunc
. The input table should contain at most one entry for each vertex. If no entry inother
is provided for a particular vertex in the graph, the map function receivesNone
.- Specified by:
outerJoinVertices
in classGraph<VD,
ED> - Parameters:
other
- the table to join with the vertices in the graph. The table should contain at most one entry for each vertex.updateF
- the function used to compute the new vertex values. The map function is invoked for all vertices, even those that do not have a corresponding entry in the table.evidence$11
- (undocumented)evidence$12
- (undocumented)eq
- (undocumented)- Returns:
- (undocumented)
-