Class DataflowGraph

Object
org.apache.spark.sql.pipelines.graph.DataflowGraph
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, GraphOperations, GraphValidations, scala.Equals, scala.Product

public class DataflowGraph extends Object implements GraphOperations, GraphValidations, scala.Product, Serializable
DataflowGraph represents the core graph structure for Spark declarative pipelines. It manages the relationships between logical flows, tables, and views, providing operations for graph traversal, validation, and transformation.
See Also:
  • Constructor Details

    • DataflowGraph

      public DataflowGraph(scala.collection.immutable.Seq<Flow> flows, scala.collection.immutable.Seq<Table> tables, scala.collection.immutable.Seq<View> views)
  • Method Details

    • flowNodes

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> flowNodes()
      Description copied from interface: GraphOperations
      A map from flow identifier to `FlowNode`, which contains the input/output nodes.
      Specified by:
      flowNodes in interface GraphOperations
    • flows

      public scala.collection.immutable.Seq<Flow> flows()
    • tables

      public scala.collection.immutable.Seq<Table> tables()
    • views

      public scala.collection.immutable.Seq<View> views()
    • output

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,Output> output()
    • materializedFlows

      public scala.collection.immutable.Seq<ResolvedFlow> materializedFlows()
    • materializedFlowIdentifiers

      public scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> materializedFlowIdentifiers()
      The identifiers of materializedFlows().
    • table

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,Table> table()
    • flow

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,Flow> flow()
    • view

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,View> view()
    • persistedViews

      public scala.collection.immutable.Seq<PersistedView> persistedViews()
    • inputIdentifiers

      public scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> inputIdentifiers()
    • flowsTo

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Seq<Flow>> flowsTo()
    • resolvedFlows

      public scala.collection.immutable.Seq<ResolvedFlow> resolvedFlows()
    • resolvedFlow

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,ResolvedFlow> resolvedFlow()
    • resolutionFailedFlows

      public scala.collection.immutable.Seq<ResolutionFailedFlow> resolutionFailedFlows()
    • resolutionFailedFlow

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,ResolutionFailedFlow> resolutionFailedFlow()
    • inferredSchema

      public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,StructType> inferredSchema()
    • validate

      public DataflowGraph validate()
      Ensure that the DataflowGraph is valid and throws errors if not.
    • resolved

      public boolean resolved()
      Returns true iff all Flows are successfully analyzed.
    • resolve

      public DataflowGraph resolve()