Class DataflowGraphTransformer

Object
org.apache.spark.sql.pipelines.graph.DataflowGraphTransformer
All Implemented Interfaces:
AutoCloseable

public class DataflowGraphTransformer extends Object implements AutoCloseable
Resolves the DataflowGraph by processing each node in the graph. This class exposes visitor functionality to resolve/analyze graph nodes. We only expose simple visitor abilities to transform different entities of the graph. For advanced transformations we also expose a mechanism to walk the graph over entity by entity.

Assumptions: 1. Each output will have at-least 1 flow to it. 2. Each flow may or may not have a destination table. If a flow does not have a destination table, the destination is a temporary view.

The way graph is structured is that flows, tables and sinks all are graph elements or nodes. While we expose transformation functions for each of these entities, we also expose a way to process to walk over the graph.

Constructor is private as all usages should be via DataflowGraphTransformer.withDataflowGraphTransformer. param: graph: Any Dataflow Graph

  • Constructor Details

    • DataflowGraphTransformer

      public DataflowGraphTransformer(DataflowGraph graph)
  • Method Details

    • withDataflowGraphTransformer

      public static <T> T withDataflowGraphTransformer(DataflowGraph graph, scala.Function1<DataflowGraphTransformer,T> f)
      Autocloseable wrapper around DataflowGraphTransformer to ensure that the transformer is closed without clients needing to remember to close it. It takes in the same arguments as DataflowGraphTransformer constructor. It exposes the DataflowGraphTransformer instance within the callable scope.
      Parameters:
      graph - (undocumented)
      f - (undocumented)
      Returns:
      (undocumented)
    • transformTables

      public DataflowGraphTransformer transformTables(scala.Function1<Table,Table> transformer)
    • transformDownNodes

      public DataflowGraphTransformer transformDownNodes(scala.Function2<GraphElement,scala.collection.immutable.Seq<GraphElement>,scala.collection.immutable.Seq<GraphElement>> transformer, boolean disableParallelism)
      Example graph: [Flow1, Flow 2] -> ST -> Flow3 -> MV Order of processing: Flow1, Flow2, ST, Flow3, MV.
      Parameters:
      transformer - function that transforms any graph entity. transformer( nodeToTransform: GraphElement, upstreamNodes: Seq[GraphElement] ) => transformedNodes: Seq[GraphElement]
      disableParallelism - (undocumented)
      Returns:
      this
    • getDataflowGraph

      public DataflowGraph getDataflowGraph()
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable