Interface GraphOperations

All Known Implementing Classes:
DataflowGraph

public interface GraphOperations
  • Method Summary

    Modifier and Type
    Method
    Description
    scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
    dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints)
    Performs a DFS starting from startNode and returns the set of nodes (datasets) reached.
    scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
    downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)
    Returns the set of flows reachable from `flowIdentifier` via output (child) edges.
    scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode>
    A map from flow identifier to `FlowNode`, which contains the input/output nodes.
    scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
    reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream)
    Returns all datasets that can be reached from destinationIdentifier.
    scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>>
    reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream)
    An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.
    scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
    upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier)
    Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.
    scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>>
    upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers)
    Traverses the graph upstream starting from the specified datasetIdentifiers to return the reachable nodes.
    scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
    upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)
    Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.
  • Method Details

    • dfsInternal

      scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> dfsInternal(org.apache.spark.sql.catalyst.TableIdentifier startDestination, boolean downstream, boolean stopAtMaterializationPoints)
      Performs a DFS starting from startNode and returns the set of nodes (datasets) reached.
      Parameters:
      startDestination - The identifier of the node to start from.
      downstream - if true, traverse output edges (search downstream) if false, traverse input edges (search upstream).
      stopAtMaterializationPoints - If true, stop when we reach a materialization point (table). If false, keep going until the end.
      Returns:
      (undocumented)
    • downstreamFlows

      scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> downstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)
      Returns the set of flows reachable from `flowIdentifier` via output (child) edges.
    • flowNodes

      scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> flowNodes()
      A map from flow identifier to `FlowNode`, which contains the input/output nodes.
    • reachabilitySet

      scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> reachabilitySet(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers, boolean downstream)
      An implementation of DFS that takes in a sequence of start nodes and returns the "reachability set" of nodes from the start nodes.

      Parameters:
      downstream - Walks the graph via the input edges if true, otherwise via the output edges.
      datasetIdentifiers - (undocumented)
      Returns:
      A map from visited nodes to its origin[s] in datasetIdentifiers, e.g. Let graph = a -> b c -> d (partitioned graph)

      reachabilitySet(Seq("a", "c"), downstream = true) -> ["a" -> ["a"], "b" -> ["a"], "c" -> ["c"], "d" -> ["c"}

    • reachabilitySet

      scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> reachabilitySet(org.apache.spark.sql.catalyst.TableIdentifier destinationIdentifier, boolean downstream)
      Returns all datasets that can be reached from destinationIdentifier.
      Parameters:
      destinationIdentifier - (undocumented)
      downstream - (undocumented)
      Returns:
      (undocumented)
    • upstreamDatasets

      scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamDatasets(org.apache.spark.sql.catalyst.TableIdentifier datasetIdentifier)
      Returns the set of datasets reachable from `datasetIdentifier` via input (parent) edges.
    • upstreamDatasets

      scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>> upstreamDatasets(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.TableIdentifier> datasetIdentifiers)
      Traverses the graph upstream starting from the specified datasetIdentifiers to return the reachable nodes. The return map's keyset consists of all datasets reachable from datasetIdentifiers. For each entry in the response map, the value of that element refers to which of datasetIdentifiers was able to reach the key. If multiple of datasetIdentifiers could reach that key, one is picked arbitrarily.
      Parameters:
      datasetIdentifiers - (undocumented)
      Returns:
      (undocumented)
    • upstreamFlows

      scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> upstreamFlows(org.apache.spark.sql.catalyst.TableIdentifier flowIdentifier)
      Returns the set of flows reachable from `flowIdentifier` via input (parent) edges.