Interface GraphOperations
- All Known Implementing Classes:
DataflowGraph
public interface GraphOperations
-
Method Summary
Modifier and TypeMethodDescriptionscala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
dfsInternal
(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints) Performs a DFS starting fromstartNode
and returns the set of nodes (datasets) reached.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
downstreamFlows
(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via output (child) edges.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
FlowNode> A map from flow identifier to `FlowNode`, which contains the input/output nodes.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
reachabilitySet
(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream) Returns all datasets that can be reached fromdestinationIdentifier
.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> reachabilitySet
(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream) An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
upstreamDatasets
(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier) Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> upstreamDatasets
(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers) Traverses the graph upstream starting from the specifieddatasetIdentifiers
to return the reachable nodes.scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
upstreamFlows
(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.
-
Method Details
-
dfsInternal
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints) Performs a DFS starting fromstartNode
and returns the set of nodes (datasets) reached.- Parameters:
startDestination
- The identifier of the node to start from.downstream
- if true, traverse output edges (search downstream) if false, traverse input edges (search upstream).stopAtMaterializationPoints
- If true, stop when we reach a materialization point (table). If false, keep going until the end.- Returns:
- (undocumented)
-
downstreamFlows
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via output (child) edges. -
flowNodes
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> flowNodes()A map from flow identifier to `FlowNode`, which contains the input/output nodes. -
reachabilitySet
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream) An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.- Parameters:
downstream
- Walks the graph via the input edges if true, otherwise via the output edges.datasetIdentifiers
- (undocumented)- Returns:
- A map from visited nodes to its origin[s] in
datasetIdentifiers
, e.g. Let graph = a -> b c -> d (partitioned graph)reachabilitySet(Seq("a", "c"), downstream = true) -> ["a" -> ["a"], "b" -> ["a"], "c" -> ["c"], "d" -> ["c"}
-
reachabilitySet
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream) Returns all datasets that can be reached fromdestinationIdentifier
.- Parameters:
destinationIdentifier
- (undocumented)downstream
- (undocumented)- Returns:
- (undocumented)
-
upstreamDatasets
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier) Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges. -
upstreamDatasets
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers) Traverses the graph upstream starting from the specifieddatasetIdentifiers
to return the reachable nodes. The return map's keyset consists of all datasets reachable fromdatasetIdentifiers
. For each entry in the response map, the value of that element refers to which ofdatasetIdentifiers
was able to reach the key. If multiple ofdatasetIdentifiers
could reach that key, one is picked arbitrarily.- Parameters:
datasetIdentifiers
- (undocumented)- Returns:
- (undocumented)
-
upstreamFlows
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier) Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.
-