case class DataflowGraph(flows: Seq[Flow], tables: Seq[Table], views: Seq[View]) extends GraphOperations with GraphValidations with Product with Serializable
DataflowGraph represents the core graph structure for Spark declarative pipelines. It manages the relationships between logical flows, tables, and views, providing operations for graph traversal, validation, and transformation.
- Source
- DataflowGraph.scala
- Alphabetic
- By Inheritance
- DataflowGraph
- Serializable
- Product
- Equals
- GraphValidations
- Logging
- GraphOperations
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- Logging
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def dfsInternal(startDestination: TableIdentifier, downstream: Boolean, stopAtMaterializationPoints: Boolean = false): Set[TableIdentifier]
Performs a DFS starting from
startNode
and returns the set of nodes (datasets) reached.Performs a DFS starting from
startNode
and returns the set of nodes (datasets) reached.- startDestination
The identifier of the node to start from.
- downstream
if true, traverse output edges (search downstream) if false, traverse input edges (search upstream).
- stopAtMaterializationPoints
If true, stop when we reach a materialization point (table). If false, keep going until the end.
- Attributes
- protected
- Definition Classes
- GraphOperations
- def downstreamFlows(flowIdentifier: TableIdentifier): Set[TableIdentifier]
Returns the set of flows reachable from
flowIdentifier
via output (child) edges.Returns the set of flows reachable from
flowIdentifier
via output (child) edges.- Definition Classes
- GraphOperations
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- lazy val flow: Map[TableIdentifier, Flow]
Map of Flows by their identifier
- lazy val flowNodes: Map[TableIdentifier, FlowNode]
A map from flow identifier to
FlowNode
, which contains the input/output nodes.A map from flow identifier to
FlowNode
, which contains the input/output nodes.- Definition Classes
- GraphOperations
- val flows: Seq[Flow]
- lazy val flowsTo: Map[TableIdentifier, Seq[Flow]]
The Flows that write to a given destination.
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- lazy val inferredSchema: Map[TableIdentifier, StructType]
A map of the inferred schema of each table, computed by merging the analyzed schemas of all flows writing to that table.
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- lazy val inputIdentifiers: Set[TableIdentifier]
All the Inputs in the current DataflowGraph.
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logBasedOnLevel(level: Level)(f: => MessageWithContext): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val materializedFlowIdentifiers: Set[TableIdentifier]
The identifiers of materializedFlows.
- lazy val materializedFlows: Seq[ResolvedFlow]
Flows in this graph that need to get planned and potentially executed when executing the graph.
Flows in this graph that need to get planned and potentially executed when executing the graph. Flows that write to logical views are excluded.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- lazy val output: Map[TableIdentifier, Output]
Map of Outputs by their identifiers
- lazy val persistedViews: Seq[PersistedView]
The PersistedViews of the graph
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- def reanalyzeFlow(srcFlow: Flow): ResolvedFlow
Used to reanalyze the flow's DF for a given table.
Used to reanalyze the flow's DF for a given table. This is done by finding all upstream flows (until a table is reached) for the specified source and reanalyzing all upstream flows.
- srcFlow
The flow that writes into the table that we will start from when finding upstream flows
- returns
The reanalyzed flow
- Attributes
- protected[graph]
- lazy val resolutionFailedFlow: Map[TableIdentifier, ResolutionFailedFlow]
- lazy val resolutionFailedFlows: Seq[ResolutionFailedFlow]
- def resolve(): DataflowGraph
- def resolved: Boolean
Returns true iff all Flows are successfully analyzed.
- lazy val resolvedFlow: Map[TableIdentifier, ResolvedFlow]
- lazy val resolvedFlows: Seq[ResolvedFlow]
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- lazy val table: Map[TableIdentifier, Table]
Map of Tables by their identifiers
- val tables: Seq[Table]
- def upstreamDatasets(datasetIdentifiers: Seq[TableIdentifier]): Map[TableIdentifier, Set[TableIdentifier]]
Traverses the graph upstream starting from the specified
datasetIdentifiers
to return the reachable nodes.Traverses the graph upstream starting from the specified
datasetIdentifiers
to return the reachable nodes. The return map's keyset consists of all datasets reachable fromdatasetIdentifiers
. For each entry in the response map, the value of that element refers to which ofdatasetIdentifiers
was able to reach the key. If multiple ofdatasetIdentifiers
could reach that key, one is picked arbitrarily.- Definition Classes
- GraphOperations
- def upstreamDatasets(datasetIdentifier: TableIdentifier): Set[TableIdentifier]
Returns the set of datasets reachable from
datasetIdentifier
via input (parent) edges.Returns the set of datasets reachable from
datasetIdentifier
via input (parent) edges.- Definition Classes
- GraphOperations
- def upstreamFlows(flowIdentifier: TableIdentifier): Set[TableIdentifier]
Returns the set of flows reachable from
flowIdentifier
via input (parent) edges.Returns the set of flows reachable from
flowIdentifier
via input (parent) edges.- Definition Classes
- GraphOperations
- def validate(): DataflowGraph
Ensure that the DataflowGraph is valid and throws errors if not.
- def validateFlowStreamingness(): Unit
Validate that each resolved flow is correctly either a streaming flow or non-streaming flow, depending on the flow type (ex.
Validate that each resolved flow is correctly either a streaming flow or non-streaming flow, depending on the flow type (ex. once flow vs non-once flow) and the dataset type the flow writes to (ex. streaming table vs materialized view).
- Attributes
- protected[graph]
- Definition Classes
- GraphValidations
- def validateGraphIsTopologicallySorted(): Unit
Throws an exception if the flows in this graph are not topologically sorted.
Throws an exception if the flows in this graph are not topologically sorted.
- Attributes
- protected[graph]
- Definition Classes
- GraphValidations
- def validateMultiQueryTables(): Map[TableIdentifier, Seq[Flow]]
Validate multi query table correctness.
Validate multi query table correctness.
- Attributes
- protected[pipelines]
- Definition Classes
- GraphValidations
- def validatePersistedViewSources(): Unit
Validates that persisted views don't read from invalid sources
Validates that persisted views don't read from invalid sources
- Attributes
- protected[graph]
- Definition Classes
- GraphValidations
- def validateSuccessfulFlowAnalysis(): Unit
Validates that all flows are resolved.
Validates that all flows are resolved. If there are unresolved flows, detects a possible cyclic dependency and throw the appropriate exception.
- Attributes
- protected
- Definition Classes
- GraphValidations
- def validateTablesAreResettable(tables: Seq[Table]): Unit
Validate that all specified tables are resettable.
Validate that all specified tables are resettable.
- Attributes
- protected
- Definition Classes
- GraphValidations
- def validateTablesAreResettable(): Unit
Validate that all tables are resettable.
Validate that all tables are resettable. This is a best-effort check that will only catch upstream tables that are resettable but have a non-resettable downstream dependency.
- Attributes
- protected
- Definition Classes
- GraphValidations
- def validateUserSpecifiedSchemas(): Unit
- Attributes
- protected
- Definition Classes
- GraphValidations
- lazy val view: Map[TableIdentifier, View]
Map of Views by their identifiers
- val views: Seq[View]
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- def withLogContext(context: Map[String, String])(body: => Unit): Unit
- Attributes
- protected
- Definition Classes
- Logging
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)