All Known Implementing Classes:: DataflowGraph

public interface GraphOperations

Method Summary

Modifier and Type

Method

Description

scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>

dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints)

Performs a DFS starting from startNode and returns the set of nodes (datasets) reached.

scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>

downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)

Returns the set of flows reachable from `flowIdentifier` via output (child) edges.

scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode>

flowNodes()

A map from flow identifier to `FlowNode`, which contains the input/output nodes.

scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>

reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream)

Returns all datasets that can be reached from destinationIdentifier.

scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>>

reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream)

An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.

scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>

upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier)

Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.

scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>>

upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers)

Traverses the graph upstream starting from the specified datasetIdentifiers to return the reachable nodes.

scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>

upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)

Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.

Method Details
- dfsInternal
  
  scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints)
  
  Performs a DFS starting from startNode and returns the set of nodes (datasets) reached.
  
  Parameters:
  
  startDestination - The identifier of the node to start from.
  
  downstream - if true, traverse output edges (search downstream) if false, traverse input edges (search upstream).
  
  stopAtMaterializationPoints - If true, stop when we reach a materialization point (table). If false, keep going until the end.
  
  Returns:
  
  (undocumented)
- downstreamFlows
  
  scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)
  
  Returns the set of flows reachable from `flowIdentifier` via output (child) edges.
- flowNodes
  
  scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> flowNodes()
  
  A map from flow identifier to `FlowNode`, which contains the input/output nodes.
- reachabilitySet
  
  scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream)
  
  An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.
  
  Parameters:
  
  downstream - Walks the graph via the input edges if true, otherwise via the output edges.
  
  datasetIdentifiers - (undocumented)
  
  Returns:
  
  A map from visited nodes to its origin[s] in datasetIdentifiers, e.g. Let graph = a -> b c -> d (partitioned graph)
  reachabilitySet(Seq("a", "c"), downstream = true) -> ["a" -> ["a"], "b" -> ["a"], "c" -> ["c"], "d" -> ["c"}
- reachabilitySet
  
  scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream)
  
  Returns all datasets that can be reached from destinationIdentifier.
  
  Parameters:
  
  destinationIdentifier - (undocumented)
  
  downstream - (undocumented)
  
  Returns:
  
  (undocumented)
- upstreamDatasets
  
  scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier)
  
  Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.
- upstreamDatasets
  
  scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers)
  
  Traverses the graph upstream starting from the specified datasetIdentifiers to return the reachable nodes. The return map's keyset consists of all datasets reachable from datasetIdentifiers. For each entry in the response map, the value of that element refers to which of datasetIdentifiers was able to reach the key. If multiple of datasetIdentifiers could reach that key, one is picked arbitrarily.
  
  Parameters:
  
  datasetIdentifiers - (undocumented)
  
  Returns:
  
  (undocumented)
- upstreamFlows
  
  scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)
  
  Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.

Interface GraphOperations

Method Summary

Method Details

dfsInternal

downstreamFlows

flowNodes

reachabilitySet

reachabilitySet

upstreamDatasets

upstreamDatasets

upstreamFlows