org.apache.spark.graphx
Class Graph<VD,ED>

Object
  extended by org.apache.spark.graphx.Graph<VD,ED>
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
GraphImpl

public abstract class Graph<VD,ED>
extends Object
implements scala.Serializable

The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges. The graph provides basic operations to access and manipulate the data associated with vertices and edges as well as the underlying structure. Like Spark RDDs, the graph is a functional data-structure in which mutating operations return new graphs.

See Also:
Serialized Form

Method Summary
<A> VertexRDD<A>
aggregateMessages(scala.Function1<EdgeContext<VD,ED,A>,scala.runtime.BoxedUnit> sendMsg, scala.Function2<A,A,A> mergeMsg, TripletFields tripletFields, scala.reflect.ClassTag<A> evidence$12)
          Aggregates values from the neighboring edges and vertices of each vertex.
static
<VD,ED> Graph<VD,ED>
apply(RDD<scala.Tuple2<Object,VD>> vertices, RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$19, scala.reflect.ClassTag<ED> evidence$20)
          Construct a graph from a collection of vertices and edges with attributes.
abstract  Graph<VD,ED> cache()
          Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default to MEMORY_ONLY.
abstract  void checkpoint()
          Mark this Graph for checkpointing.
abstract  EdgeRDD<ED> edges()
          An RDD containing the edges and their associated attributes.
static
<VD,ED> Graph<VD,ED>
fromEdges(RDD<Edge<ED>> edges, VD defaultValue, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$17, scala.reflect.ClassTag<ED> evidence$18)
          Construct a graph from a collection of edges.
static
<VD> Graph<VD,Object>
fromEdgeTuples(RDD<scala.Tuple2<Object,Object>> rawEdges, VD defaultValue, scala.Option<PartitionStrategy> uniqueEdges, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$16)
          Construct a graph from a collection of edges encoded as vertex id pairs.
abstract  scala.collection.Seq<String> getCheckpointFiles()
          Gets the name of the files to which this Graph was checkpointed.
static
<VD,ED> GraphOps<VD,ED>
graphToGraphOps(Graph<VD,ED> g, scala.reflect.ClassTag<VD> evidence$21, scala.reflect.ClassTag<ED> evidence$22)
          Implicitly extracts the GraphOps member from a graph.
abstract  Graph<VD,ED> groupEdges(scala.Function2<ED,ED,ED> merge)
          Merges multiple edges between two vertices into a single edge.
abstract  boolean isCheckpointed()
          Return whether this Graph has been checkpointed or not.
<ED2> Graph<VD,ED2>
mapEdges(scala.Function1<Edge<ED>,ED2> map, scala.reflect.ClassTag<ED2> evidence$4)
          Transforms each edge attribute in the graph using the map function.
abstract
<ED2> Graph<VD,ED2>
mapEdges(scala.Function2<Object,scala.collection.Iterator<Edge<ED>>,scala.collection.Iterator<ED2>> map, scala.reflect.ClassTag<ED2> evidence$5)
          Transforms each edge attribute using the map function, passing it a whole partition at a time.
abstract
<A> VertexRDD<A>
mapReduceTriplets(scala.Function1<EdgeTriplet<VD,ED>,scala.collection.Iterator<scala.Tuple2<Object,A>>> mapFunc, scala.Function2<A,A,A> reduceFunc, scala.Option<scala.Tuple2<VertexRDD<?>,EdgeDirection>> activeSetOpt, scala.reflect.ClassTag<A> evidence$11)
          Aggregates values from the neighboring edges and vertices of each vertex.
<ED2> Graph<VD,ED2>
mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map, scala.reflect.ClassTag<ED2> evidence$6)
          Transforms each edge attribute using the map function, passing it the adjacent vertex attributes as well.
<ED2> Graph<VD,ED2>
mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$7)
          Transforms each edge attribute using the map function, passing it the adjacent vertex attributes as well.
abstract
<ED2> Graph<VD,ED2>
mapTriplets(scala.Function2<Object,scala.collection.Iterator<EdgeTriplet<VD,ED>>,scala.collection.Iterator<ED2>> map, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$8)
          Transforms each edge attribute a partition at a time using the map function, passing it the adjacent vertex attributes as well.
abstract
<VD2> Graph<VD2,ED>
mapVertices(scala.Function2<Object,VD,VD2> map, scala.reflect.ClassTag<VD2> evidence$3, scala.Predef.$eq$colon$eq<VD,VD2> eq)
          Transforms each vertex attribute in the graph using the map function.
abstract
<VD2,ED2> Graph<VD,ED>
mask(Graph<VD2,ED2> other, scala.reflect.ClassTag<VD2> evidence$9, scala.reflect.ClassTag<ED2> evidence$10)
          Restricts the graph to only the vertices and edges that are also in other, but keeps the attributes from this graph.
 GraphOps<VD,ED> ops()
          The associated GraphOps object.
abstract
<U,VD2> Graph<VD2,ED>
outerJoinVertices(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,scala.Option<U>,VD2> mapFunc, scala.reflect.ClassTag<U> evidence$14, scala.reflect.ClassTag<VD2> evidence$15, scala.Predef.$eq$colon$eq<VD,VD2> eq)
          Joins the vertices with entries in the table RDD and merges the results using mapFunc.
abstract  Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy)
          Repartitions the edges in the graph according to partitionStrategy.
abstract  Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy, int numPartitions)
          Repartitions the edges in the graph according to partitionStrategy.
abstract  Graph<VD,ED> persist(StorageLevel newLevel)
          Caches the vertices and edges associated with this graph at the specified storage level, ignoring any target storage levels previously set.
abstract  Graph<VD,ED> reverse()
          Reverses all edges in the graph.
abstract  Graph<VD,ED> subgraph(scala.Function1<EdgeTriplet<VD,ED>,Object> epred, scala.Function2<Object,VD,Object> vpred)
          Restricts the graph to only the vertices and edges satisfying the predicates.
abstract  RDD<EdgeTriplet<VD,ED>> triplets()
          An RDD containing the edge triplets, which are edges along with the vertex data associated with the adjacent vertices.
abstract  Graph<VD,ED> unpersist(boolean blocking)
          Uncaches both vertices and edges of this graph.
abstract  Graph<VD,ED> unpersistVertices(boolean blocking)
          Uncaches only the vertices of this graph, leaving the edges alone.
abstract  VertexRDD<VD> vertices()
          An RDD containing the vertices and their associated attributes.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

fromEdgeTuples

public static <VD> Graph<VD,Object> fromEdgeTuples(RDD<scala.Tuple2<Object,Object>> rawEdges,
                                                   VD defaultValue,
                                                   scala.Option<PartitionStrategy> uniqueEdges,
                                                   StorageLevel edgeStorageLevel,
                                                   StorageLevel vertexStorageLevel,
                                                   scala.reflect.ClassTag<VD> evidence$16)
Construct a graph from a collection of edges encoded as vertex id pairs.

Parameters:
rawEdges - a collection of edges in (src, dst) form
defaultValue - the vertex attributes with which to create vertices referenced by the edges
uniqueEdges - if multiple identical edges are found they are combined and the edge attribute is set to the sum. Otherwise duplicate edges are treated as separate. To enable uniqueEdges, a PartitionStrategy must be provided.
edgeStorageLevel - the desired storage level at which to cache the edges if necessary
vertexStorageLevel - the desired storage level at which to cache the vertices if necessary

evidence$16 - (undocumented)
Returns:
a graph with edge attributes containing either the count of duplicate edges or 1 (if uniqueEdges is None) and vertex attributes containing the total degree of each vertex.

fromEdges

public static <VD,ED> Graph<VD,ED> fromEdges(RDD<Edge<ED>> edges,
                                             VD defaultValue,
                                             StorageLevel edgeStorageLevel,
                                             StorageLevel vertexStorageLevel,
                                             scala.reflect.ClassTag<VD> evidence$17,
                                             scala.reflect.ClassTag<ED> evidence$18)
Construct a graph from a collection of edges.

Parameters:
edges - the RDD containing the set of edges in the graph
defaultValue - the default vertex attribute to use for each vertex
edgeStorageLevel - the desired storage level at which to cache the edges if necessary
vertexStorageLevel - the desired storage level at which to cache the vertices if necessary

evidence$17 - (undocumented)
evidence$18 - (undocumented)
Returns:
a graph with edge attributes described by edges and vertices given by all vertices in edges with value defaultValue

apply

public static <VD,ED> Graph<VD,ED> apply(RDD<scala.Tuple2<Object,VD>> vertices,
                                         RDD<Edge<ED>> edges,
                                         VD defaultVertexAttr,
                                         StorageLevel edgeStorageLevel,
                                         StorageLevel vertexStorageLevel,
                                         scala.reflect.ClassTag<VD> evidence$19,
                                         scala.reflect.ClassTag<ED> evidence$20)
Construct a graph from a collection of vertices and edges with attributes. Duplicate vertices are picked arbitrarily and vertices found in the edge collection but not in the input vertices are assigned the default attribute.

Parameters:
vertices - the "set" of vertices and their attributes
edges - the collection of edges in the graph
defaultVertexAttr - the default vertex attribute to use for vertices that are mentioned in edges but not in vertices
edgeStorageLevel - the desired storage level at which to cache the edges if necessary
vertexStorageLevel - the desired storage level at which to cache the vertices if necessary
evidence$19 - (undocumented)
evidence$20 - (undocumented)
Returns:
(undocumented)

graphToGraphOps

public static <VD,ED> GraphOps<VD,ED> graphToGraphOps(Graph<VD,ED> g,
                                                      scala.reflect.ClassTag<VD> evidence$21,
                                                      scala.reflect.ClassTag<ED> evidence$22)
Implicitly extracts the GraphOps member from a graph.

To improve modularity the Graph type only contains a small set of basic operations. All the convenience operations are defined in the GraphOps class which may be shared across multiple graph implementations.

Parameters:
g - (undocumented)
evidence$21 - (undocumented)
evidence$22 - (undocumented)
Returns:
(undocumented)

vertices

public abstract VertexRDD<VD> vertices()
An RDD containing the vertices and their associated attributes.

Returns:
an RDD containing the vertices in this graph

edges

public abstract EdgeRDD<ED> edges()
An RDD containing the edges and their associated attributes. The entries in the RDD contain just the source id and target id along with the edge data.

Returns:
an RDD containing the edges in this graph

See Also:
Edge} for the edge type., Graph#triplets} to get an RDD which contains all the edges along with their vertex data.


triplets

public abstract RDD<EdgeTriplet<VD,ED>> triplets()
An RDD containing the edge triplets, which are edges along with the vertex data associated with the adjacent vertices. The caller should use edges if the vertex data are not needed, i.e. if only the edge data and adjacent vertex ids are needed.

Returns:
an RDD containing edge triplets


persist

public abstract Graph<VD,ED> persist(StorageLevel newLevel)
Caches the vertices and edges associated with this graph at the specified storage level, ignoring any target storage levels previously set.

Parameters:
newLevel - the level at which to cache the graph.

Returns:
A reference to this graph for convenience.

cache

public abstract Graph<VD,ED> cache()
Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default to MEMORY_ONLY. This is used to pin a graph in memory enabling multiple queries to reuse the same construction process.

Returns:
(undocumented)

checkpoint

public abstract void checkpoint()
Mark this Graph for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. It is strongly recommended that this Graph is persisted in memory, otherwise saving it on a file will require recomputation.


isCheckpointed

public abstract boolean isCheckpointed()
Return whether this Graph has been checkpointed or not. This returns true iff both the vertices RDD and edges RDD have been checkpointed.

Returns:
(undocumented)

getCheckpointFiles

public abstract scala.collection.Seq<String> getCheckpointFiles()
Gets the name of the files to which this Graph was checkpointed. (The vertices RDD and edges RDD are checkpointed separately.)

Returns:
(undocumented)

unpersist

public abstract Graph<VD,ED> unpersist(boolean blocking)
Uncaches both vertices and edges of this graph. This is useful in iterative algorithms that build a new graph in each iteration.

Parameters:
blocking - (undocumented)
Returns:
(undocumented)

unpersistVertices

public abstract Graph<VD,ED> unpersistVertices(boolean blocking)
Uncaches only the vertices of this graph, leaving the edges alone. This is useful in iterative algorithms that modify the vertex attributes but reuse the edges. This method can be used to uncache the vertex attributes of previous iterations once they are no longer needed, improving GC performance.

Parameters:
blocking - (undocumented)
Returns:
(undocumented)

partitionBy

public abstract Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy)
Repartitions the edges in the graph according to partitionStrategy.

Parameters:
partitionStrategy - the partitioning strategy to use when partitioning the edges in the graph.
Returns:
(undocumented)

partitionBy

public abstract Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy,
                                         int numPartitions)
Repartitions the edges in the graph according to partitionStrategy.

Parameters:
partitionStrategy - the partitioning strategy to use when partitioning the edges in the graph.
numPartitions - the number of edge partitions in the new graph.
Returns:
(undocumented)

mapVertices

public abstract <VD2> Graph<VD2,ED> mapVertices(scala.Function2<Object,VD,VD2> map,
                                                scala.reflect.ClassTag<VD2> evidence$3,
                                                scala.Predef.$eq$colon$eq<VD,VD2> eq)
Transforms each vertex attribute in the graph using the map function.

Parameters:
map - the function from a vertex object to a new vertex value

evidence$3 - (undocumented)
eq - (undocumented)
Returns:
(undocumented)

mapEdges

public <ED2> Graph<VD,ED2> mapEdges(scala.Function1<Edge<ED>,ED2> map,
                                    scala.reflect.ClassTag<ED2> evidence$4)
Transforms each edge attribute in the graph using the map function. The map function is not passed the vertex value for the vertices adjacent to the edge. If vertex values are desired, use mapTriplets.

Parameters:
map - the function from an edge object to a new edge value.

evidence$4 - (undocumented)
Returns:
(undocumented)

mapEdges

public abstract <ED2> Graph<VD,ED2> mapEdges(scala.Function2<Object,scala.collection.Iterator<Edge<ED>>,scala.collection.Iterator<ED2>> map,
                                             scala.reflect.ClassTag<ED2> evidence$5)
Transforms each edge attribute using the map function, passing it a whole partition at a time. The map function is given an iterator over edges within a logical partition as well as the partition's ID, and it should return a new iterator over the new values of each edge. The new iterator's elements must correspond one-to-one with the old iterator's elements. If adjacent vertex values are desired, use mapTriplets.

Parameters:
map - a function that takes a partition id and an iterator over all the edges in the partition, and must return an iterator over the new values for each edge in the order of the input iterator

evidence$5 - (undocumented)
Returns:
(undocumented)

mapTriplets

public <ED2> Graph<VD,ED2> mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map,
                                       scala.reflect.ClassTag<ED2> evidence$6)
Transforms each edge attribute using the map function, passing it the adjacent vertex attributes as well. If adjacent vertex values are not required, consider using mapEdges instead.

Parameters:
map - the function from an edge object to a new edge value.

evidence$6 - (undocumented)
Returns:
(undocumented)

mapTriplets

public <ED2> Graph<VD,ED2> mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map,
                                       TripletFields tripletFields,
                                       scala.reflect.ClassTag<ED2> evidence$7)
Transforms each edge attribute using the map function, passing it the adjacent vertex attributes as well. If adjacent vertex values are not required, consider using mapEdges instead.

Parameters:
map - the function from an edge object to a new edge value.
tripletFields - which fields should be included in the edge triplet passed to the map function. If not all fields are needed, specifying this can improve performance.

evidence$7 - (undocumented)
Returns:
(undocumented)

mapTriplets

public abstract <ED2> Graph<VD,ED2> mapTriplets(scala.Function2<Object,scala.collection.Iterator<EdgeTriplet<VD,ED>>,scala.collection.Iterator<ED2>> map,
                                                TripletFields tripletFields,
                                                scala.reflect.ClassTag<ED2> evidence$8)
Transforms each edge attribute a partition at a time using the map function, passing it the adjacent vertex attributes as well. The map function is given an iterator over edge triplets within a logical partition and should yield a new iterator over the new values of each edge in the order in which they are provided. If adjacent vertex values are not required, consider using mapEdges instead.

Parameters:
map - the iterator transform
tripletFields - which fields should be included in the edge triplet passed to the map function. If not all fields are needed, specifying this can improve performance.

evidence$8 - (undocumented)
Returns:
(undocumented)

reverse

public abstract Graph<VD,ED> reverse()
Reverses all edges in the graph. If this graph contains an edge from a to b then the returned graph contains an edge from b to a.

Returns:
(undocumented)

subgraph

public abstract Graph<VD,ED> subgraph(scala.Function1<EdgeTriplet<VD,ED>,Object> epred,
                                      scala.Function2<Object,VD,Object> vpred)
Restricts the graph to only the vertices and edges satisfying the predicates. The resulting subgraph satisifies


 V' = {v : for all v in V where vpred(v)}
 E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && vpred(v)}
 

Parameters:
epred - the edge predicate, which takes a triplet and evaluates to true if the edge is to remain in the subgraph. Note that only edges where both vertices satisfy the vertex predicate are considered.

vpred - the vertex predicate, which takes a vertex object and evaluates to true if the vertex is to be included in the subgraph

Returns:
the subgraph containing only the vertices and edges that satisfy the predicates

mask

public abstract <VD2,ED2> Graph<VD,ED> mask(Graph<VD2,ED2> other,
                                            scala.reflect.ClassTag<VD2> evidence$9,
                                            scala.reflect.ClassTag<ED2> evidence$10)
Restricts the graph to only the vertices and edges that are also in other, but keeps the attributes from this graph.

Parameters:
other - the graph to project this graph onto
evidence$9 - (undocumented)
evidence$10 - (undocumented)
Returns:
a graph with vertices and edges that exist in both the current graph and other, with vertex and edge data from the current graph

groupEdges

public abstract Graph<VD,ED> groupEdges(scala.Function2<ED,ED,ED> merge)
Merges multiple edges between two vertices into a single edge. For correct results, the graph must have been partitioned using partitionBy.

Parameters:
merge - the user-supplied commutative associative function to merge edge attributes for duplicate edges.

Returns:
The resulting graph with a single edge for each (source, dest) vertex pair.

mapReduceTriplets

public abstract <A> VertexRDD<A> mapReduceTriplets(scala.Function1<EdgeTriplet<VD,ED>,scala.collection.Iterator<scala.Tuple2<Object,A>>> mapFunc,
                                                   scala.Function2<A,A,A> reduceFunc,
                                                   scala.Option<scala.Tuple2<VertexRDD<?>,EdgeDirection>> activeSetOpt,
                                                   scala.reflect.ClassTag<A> evidence$11)
Aggregates values from the neighboring edges and vertices of each vertex. The user supplied mapFunc function is invoked on each edge of the graph, generating 0 or more "messages" to be "sent" to either vertex in the edge. The reduceFunc is then used to combine the output of the map phase destined to each vertex.

This function is deprecated in 1.2.0 because of SPARK-3936. Use aggregateMessages instead.

Parameters:
mapFunc - the user defined map function which returns 0 or more messages to neighboring vertices

reduceFunc - the user defined reduce function which should be commutative and associative and is used to combine the output of the map phase

activeSetOpt - an efficient way to run the aggregation on a subset of the edges if desired. This is done by specifying a set of "active" vertices and an edge direction. The sendMsg function will then run only on edges connected to active vertices by edges in the specified direction. If the direction is In, sendMsg will only be run on edges with destination in the active set. If the direction is Out, sendMsg will only be run on edges originating from vertices in the active set. If the direction is Either, sendMsg will be run on edges with *either* vertex in the active set. If the direction is Both, sendMsg will be run on edges with *both* vertices in the active set. The active set must have the same index as the graph's vertices.

evidence$11 - (undocumented)
Returns:
(undocumented)

aggregateMessages

public <A> VertexRDD<A> aggregateMessages(scala.Function1<EdgeContext<VD,ED,A>,scala.runtime.BoxedUnit> sendMsg,
                                          scala.Function2<A,A,A> mergeMsg,
                                          TripletFields tripletFields,
                                          scala.reflect.ClassTag<A> evidence$12)
Aggregates values from the neighboring edges and vertices of each vertex. The user-supplied sendMsg function is invoked on each edge of the graph, generating 0 or more messages to be sent to either vertex in the edge. The mergeMsg function is then used to combine all messages destined to the same vertex.

Parameters:
sendMsg - runs on each edge, sending messages to neighboring vertices using the EdgeContext.
mergeMsg - used to combine messages from sendMsg destined to the same vertex. This combiner should be commutative and associative.
tripletFields - which fields should be included in the EdgeContext passed to the sendMsg function. If not all fields are needed, specifying this can improve performance.

evidence$12 - (undocumented)
Returns:
(undocumented)

outerJoinVertices

public abstract <U,VD2> Graph<VD2,ED> outerJoinVertices(RDD<scala.Tuple2<Object,U>> other,
                                                        scala.Function3<Object,VD,scala.Option<U>,VD2> mapFunc,
                                                        scala.reflect.ClassTag<U> evidence$14,
                                                        scala.reflect.ClassTag<VD2> evidence$15,
                                                        scala.Predef.$eq$colon$eq<VD,VD2> eq)
Joins the vertices with entries in the table RDD and merges the results using mapFunc. The input table should contain at most one entry for each vertex. If no entry in other is provided for a particular vertex in the graph, the map function receives None.

Parameters:
other - the table to join with the vertices in the graph. The table should contain at most one entry for each vertex.
mapFunc - the function used to compute the new vertex values. The map function is invoked for all vertices, even those that do not have a corresponding entry in the table.

evidence$14 - (undocumented)
evidence$15 - (undocumented)
eq - (undocumented)
Returns:
(undocumented)

ops

public GraphOps<VD,ED> ops()
The associated GraphOps object.

Returns:
(undocumented)