Class DataflowGraph
Object
org.apache.spark.sql.pipelines.graph.DataflowGraph
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,GraphOperations
,GraphValidations
,scala.Equals
,scala.Product
public class DataflowGraph
extends Object
implements GraphOperations, GraphValidations, scala.Product, Serializable
DataflowGraph represents the core graph structure for Spark declarative pipelines.
It manages the relationships between logical flows, tables, and views, providing
operations for graph traversal, validation, and transformation.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorsConstructorDescriptionDataflowGraph
(scala.collection.immutable.Seq<Flow> flows, scala.collection.immutable.Seq<Table> tables, scala.collection.immutable.Seq<View> views) -
Method Summary
Modifier and TypeMethodDescriptionscala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
Flow> flow()
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
FlowNode> A map from flow identifier to `FlowNode`, which contains the input/output nodes.scala.collection.immutable.Seq<Flow>
flows()
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
scala.collection.immutable.Seq<Flow>> flowsTo()
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
StructType> scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier>
The identifiers ofmaterializedFlows()
.scala.collection.immutable.Seq<ResolvedFlow>
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
Output> output()
scala.collection.immutable.Seq<PersistedView>
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
ResolutionFailedFlow> scala.collection.immutable.Seq<ResolutionFailedFlow>
resolve()
boolean
resolved()
Returns true iff allFlow
s are successfully analyzed.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
ResolvedFlow> scala.collection.immutable.Seq<ResolvedFlow>
scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
Table> table()
scala.collection.immutable.Seq<Table>
tables()
validate()
Ensure that theDataflowGraph
is valid and throws errors if not.scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,
View> view()
scala.collection.immutable.Seq<View>
views()
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface scala.Equals
canEqual, equals
Methods inherited from interface org.apache.spark.sql.pipelines.graph.GraphOperations
dfsInternal, downstreamFlows, reachabilitySet, reachabilitySet, upstreamDatasets, upstreamDatasets, upstreamFlows
Methods inherited from interface org.apache.spark.sql.pipelines.graph.GraphValidations
detectCycle, validateFlowStreamingness, validateGraphIsTopologicallySorted, validateMultiQueryTables, validatePersistedViewSources, validateSuccessfulFlowAnalysis, validateTablesAreResettable, validateTablesAreResettable, validateUserSpecifiedSchemas
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface scala.Product
productArity, productElement, productElementName, productElementNames, productIterator, productPrefix
-
Constructor Details
-
DataflowGraph
-
-
Method Details
-
flowNodes
public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,FlowNode> flowNodes()Description copied from interface:GraphOperations
A map from flow identifier to `FlowNode`, which contains the input/output nodes.- Specified by:
flowNodes
in interfaceGraphOperations
-
flows
-
tables
-
views
-
output
public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,Output> output() -
materializedFlows
-
materializedFlowIdentifiers
public scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> materializedFlowIdentifiers()The identifiers ofmaterializedFlows()
. -
table
-
flow
-
view
-
persistedViews
-
inputIdentifiers
public scala.collection.immutable.Set<org.apache.spark.sql.catalyst.TableIdentifier> inputIdentifiers() -
flowsTo
public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,scala.collection.immutable.Seq<Flow>> flowsTo() -
resolvedFlows
-
resolvedFlow
public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,ResolvedFlow> resolvedFlow() -
resolutionFailedFlows
-
resolutionFailedFlow
public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,ResolutionFailedFlow> resolutionFailedFlow() -
inferredSchema
public scala.collection.immutable.Map<org.apache.spark.sql.catalyst.TableIdentifier,StructType> inferredSchema() -
validate
Ensure that theDataflowGraph
is valid and throws errors if not. -
resolved
public boolean resolved()Returns true iff allFlow
s are successfully analyzed. -
resolve
-