Packages

case class DataflowGraph(flows: Seq[Flow], tables: Seq[Table], views: Seq[View]) extends GraphOperations with GraphValidations with Product with Serializable

DataflowGraph represents the core graph structure for Spark declarative pipelines. It manages the relationships between logical flows, tables, and views, providing operations for graph traversal, validation, and transformation.

Source
DataflowGraph.scala
Linear Supertypes
Serializable, Product, Equals, GraphValidations, Logging, GraphOperations, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataflowGraph
  2. Serializable
  3. Product
  4. Equals
  5. GraphValidations
  6. Logging
  7. GraphOperations
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DataflowGraph(flows: Seq[Flow], tables: Seq[Table], views: Seq[View])

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    Logging

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  6. def dfsInternal(startDestination: TableIdentifier, downstream: Boolean, stopAtMaterializationPoints: Boolean = false): Set[TableIdentifier]

    Performs a DFS starting from startNode and returns the set of nodes (datasets) reached.

    Performs a DFS starting from startNode and returns the set of nodes (datasets) reached.

    startDestination

    The identifier of the node to start from.

    downstream

    if true, traverse output edges (search downstream) if false, traverse input edges (search upstream).

    stopAtMaterializationPoints

    If true, stop when we reach a materialization point (table). If false, keep going until the end.

    Attributes
    protected
    Definition Classes
    GraphOperations
  7. def downstreamFlows(flowIdentifier: TableIdentifier): Set[TableIdentifier]

    Returns the set of flows reachable from flowIdentifier via output (child) edges.

    Returns the set of flows reachable from flowIdentifier via output (child) edges.

    Definition Classes
    GraphOperations
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. lazy val flow: Map[TableIdentifier, Flow]

    Map of Flows by their identifier

  10. lazy val flowNodes: Map[TableIdentifier, FlowNode]

    A map from flow identifier to FlowNode, which contains the input/output nodes.

    A map from flow identifier to FlowNode, which contains the input/output nodes.

    Definition Classes
    GraphOperations
  11. val flows: Seq[Flow]
  12. lazy val flowsTo: Map[TableIdentifier, Seq[Flow]]

    The Flows that write to a given destination.

  13. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  14. lazy val inferredSchema: Map[TableIdentifier, StructType]

    A map of the inferred schema of each table, computed by merging the analyzed schemas of all flows writing to that table.

  15. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  16. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  17. lazy val inputIdentifiers: Set[TableIdentifier]

    All the Inputs in the current DataflowGraph.

  18. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  19. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  20. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  21. def logBasedOnLevel(level: Level)(f: => MessageWithContext): Unit
    Attributes
    protected
    Definition Classes
    Logging
  22. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  26. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  27. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  35. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  40. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. val materializedFlowIdentifiers: Set[TableIdentifier]

    The identifiers of materializedFlows.

  44. lazy val materializedFlows: Seq[ResolvedFlow]

    Flows in this graph that need to get planned and potentially executed when executing the graph.

    Flows in this graph that need to get planned and potentially executed when executing the graph. Flows that write to logical views are excluded.

  45. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  46. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  47. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  48. lazy val output: Map[TableIdentifier, Output]

    Map of Outputs by their identifiers

  49. lazy val persistedViews: Seq[PersistedView]

    The PersistedViews of the graph

  50. def productElementNames: Iterator[String]
    Definition Classes
    Product
  51. def reanalyzeFlow(srcFlow: Flow): ResolvedFlow

    Used to reanalyze the flow's DF for a given table.

    Used to reanalyze the flow's DF for a given table. This is done by finding all upstream flows (until a table is reached) for the specified source and reanalyzing all upstream flows.

    srcFlow

    The flow that writes into the table that we will start from when finding upstream flows

    returns

    The reanalyzed flow

    Attributes
    protected[graph]
  52. lazy val resolutionFailedFlow: Map[TableIdentifier, ResolutionFailedFlow]
  53. lazy val resolutionFailedFlows: Seq[ResolutionFailedFlow]
  54. def resolve(): DataflowGraph
  55. def resolved: Boolean

    Returns true iff all Flows are successfully analyzed.

  56. lazy val resolvedFlow: Map[TableIdentifier, ResolvedFlow]
  57. lazy val resolvedFlows: Seq[ResolvedFlow]
  58. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  59. lazy val table: Map[TableIdentifier, Table]

    Map of Tables by their identifiers

  60. val tables: Seq[Table]
  61. def upstreamDatasets(datasetIdentifiers: Seq[TableIdentifier]): Map[TableIdentifier, Set[TableIdentifier]]

    Traverses the graph upstream starting from the specified datasetIdentifiers to return the reachable nodes.

    Traverses the graph upstream starting from the specified datasetIdentifiers to return the reachable nodes. The return map's keyset consists of all datasets reachable from datasetIdentifiers. For each entry in the response map, the value of that element refers to which of datasetIdentifiers was able to reach the key. If multiple of datasetIdentifiers could reach that key, one is picked arbitrarily.

    Definition Classes
    GraphOperations
  62. def upstreamDatasets(datasetIdentifier: TableIdentifier): Set[TableIdentifier]

    Returns the set of datasets reachable from datasetIdentifier via input (parent) edges.

    Returns the set of datasets reachable from datasetIdentifier via input (parent) edges.

    Definition Classes
    GraphOperations
  63. def upstreamFlows(flowIdentifier: TableIdentifier): Set[TableIdentifier]

    Returns the set of flows reachable from flowIdentifier via input (parent) edges.

    Returns the set of flows reachable from flowIdentifier via input (parent) edges.

    Definition Classes
    GraphOperations
  64. def validate(): DataflowGraph

    Ensure that the DataflowGraph is valid and throws errors if not.

  65. def validateFlowStreamingness(): Unit

    Validate that each resolved flow is correctly either a streaming flow or non-streaming flow, depending on the flow type (ex.

    Validate that each resolved flow is correctly either a streaming flow or non-streaming flow, depending on the flow type (ex. once flow vs non-once flow) and the dataset type the flow writes to (ex. streaming table vs materialized view).

    Attributes
    protected[graph]
    Definition Classes
    GraphValidations
  66. def validateGraphIsTopologicallySorted(): Unit

    Throws an exception if the flows in this graph are not topologically sorted.

    Throws an exception if the flows in this graph are not topologically sorted.

    Attributes
    protected[graph]
    Definition Classes
    GraphValidations
  67. def validateMultiQueryTables(): Map[TableIdentifier, Seq[Flow]]

    Validate multi query table correctness.

    Validate multi query table correctness.

    Attributes
    protected[pipelines]
    Definition Classes
    GraphValidations
  68. def validatePersistedViewSources(): Unit

    Validates that persisted views don't read from invalid sources

    Validates that persisted views don't read from invalid sources

    Attributes
    protected[graph]
    Definition Classes
    GraphValidations
  69. def validateSuccessfulFlowAnalysis(): Unit

    Validates that all flows are resolved.

    Validates that all flows are resolved. If there are unresolved flows, detects a possible cyclic dependency and throw the appropriate exception.

    Attributes
    protected
    Definition Classes
    GraphValidations
  70. def validateTablesAreResettable(tables: Seq[Table]): Unit

    Validate that all specified tables are resettable.

    Validate that all specified tables are resettable.

    Attributes
    protected
    Definition Classes
    GraphValidations
  71. def validateTablesAreResettable(): Unit

    Validate that all tables are resettable.

    Validate that all tables are resettable. This is a best-effort check that will only catch upstream tables that are resettable but have a non-resettable downstream dependency.

    Attributes
    protected
    Definition Classes
    GraphValidations
  72. def validateUserSpecifiedSchemas(): Unit
    Attributes
    protected
    Definition Classes
    GraphValidations
  73. lazy val view: Map[TableIdentifier, View]

    Map of Views by their identifiers

  74. val views: Seq[View]
  75. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  76. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  77. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  78. def withLogContext(context: Map[String, String])(body: => Unit): Unit
    Attributes
    protected
    Definition Classes
    Logging

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from GraphValidations

Inherited from Logging

Inherited from GraphOperations

Inherited from AnyRef

Inherited from Any

Ungrouped