Packages

package graph

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Type Members

  1. class AppendOnceFlow extends ResolvedFlow

    A Flow that reads source[s] completely and appends data to the target, just once.

  2. class BatchTableWrite extends FlowExecution

    A FlowExecution that writes a batch DataFrame to a Table.

  3. case class CircularDependencyException(downstreamTable: TableIdentifier, upstreamDataset: TableIdentifier) extends AnalysisException with Product with Serializable

    Raised when there's a circular dependency in the current pipeline.

    Raised when there's a circular dependency in the current pipeline. That is, a downstream table is referenced while creating a upstream table.

  4. class CompleteFlow extends ResolvedFlow

    A Flow that declares exactly what data should be in the target table.

  5. class CoreDataflowNodeProcessor extends AnyRef

    Processor that is responsible for analyzing each flow and sort the nodes in topological order

  6. case class DataflowGraph(flows: Seq[Flow], tables: Seq[Table], views: Seq[View]) extends GraphOperations with GraphValidations with Product with Serializable

    DataflowGraph represents the core graph structure for Spark declarative pipelines.

    DataflowGraph represents the core graph structure for Spark declarative pipelines. It manages the relationships between logical flows, tables, and views, providing operations for graph traversal, validation, and transformation.

  7. class DataflowGraphTransformer extends AutoCloseable

    Resolves the DataflowGraph by processing each node in the graph.

    Resolves the DataflowGraph by processing each node in the graph. This class exposes visitor functionality to resolve/analyze graph nodes. We only expose simple visitor abilities to transform different entities of the graph. For advanced transformations we also expose a mechanism to walk the graph over entity by entity.

    Assumptions: 1. Each output will have at-least 1 flow to it. 2. Each flow may or may not have a destination table. If a flow does not have a destination table, the destination is a temporary view.

    The way graph is structured is that flows, tables and sinks all are graph elements or nodes. While we expose transformation functions for each of these entities, we also expose a way to process to walk over the graph.

    Constructor is private as all usages should be via DataflowGraphTransformer.withDataflowGraphTransformer.

  8. sealed trait ExecutionResult extends AnyRef

    A flow's execution may complete for two reasons: 1.

    A flow's execution may complete for two reasons: 1. it may finish performing all of its necessary work, or 2. it may be interrupted by a request from a user to stop it.

    We use this result to disambiguate these two cases, using 'ExecutionResult.FINISHED' for the former and 'ExecutionResult.STOPPED' for the latter.

  9. case class FailureStoppingFlow(flowIdentifiers: Seq[TableIdentifier]) extends FailureStoppingOperation with Product with Serializable

    Indicates that there was a failure while stopping the flow.

  10. abstract class FailureStoppingOperation extends RunFailure

    Abstract class used to identify failures related to failures stopping an operation/timeouts.

  11. trait Flow extends GraphElement with Logging

    A Flow is a node of data transformation in a dataflow graph.

    A Flow is a node of data transformation in a dataflow graph. It describes the movement of data into a particular dataset.

  12. trait FlowExecution extends AnyRef

    A FlowExecution specifies how to execute a flow and manages its execution.

  13. sealed trait FlowFilter extends GraphFilter[ResolvedFlow]

    Specifies how we should filter Flows.

  14. trait FlowFunction extends Logging

    A wrapper for the lambda function that defines a Flow.

  15. case class FlowFunctionResult(requestedInputs: Set[TableIdentifier], batchInputs: Set[ResolvedInput], streamingInputs: Set[ResolvedInput], usedExternalInputs: Set[TableIdentifier], dataFrame: Try[classic.DataFrame], sqlConf: Map[String, String], analysisWarnings: Seq[AnalysisWarning] = Nil) extends Product with Serializable

    Holds the DataFrame returned by a FlowFunction along with the inputs used to construct it.

    Holds the DataFrame returned by a FlowFunction along with the inputs used to construct it.

    batchInputs

    the complete inputs read by the flow

    streamingInputs

    the incremental inputs read by the flow

    usedExternalInputs

    the identifiers of the external inputs read by the flow

    dataFrame

    the DataFrame expression executed by the flow if the flow can be resolved

  16. case class FlowNode(identifier: TableIdentifier, inputs: Set[TableIdentifier], output: TableIdentifier) extends Product with Serializable

    identifier

    The identifier of the flow.

    inputs

    The identifiers of nodes used as inputs to this flow.

    output

    The identifier of the output that this flow writes to.

  17. class FlowPlanner extends AnyRef

    Plans execution of Flows in a DataflowGraph by converting Flows into 'FlowExecution's.

  18. case class FlowsForTables(selectedTables: Set[TableIdentifier]) extends FlowFilter with Product with Serializable

    Used in partial graph updates to select flows that flow to "selectedTables".

  19. trait GraphElement extends AnyRef

    An element in a DataflowGraph.

  20. abstract class GraphExecution extends Logging
  21. sealed trait GraphFilter[E] extends AnyRef

    Specifies how we should filter Graph elements.

  22. trait GraphOperations extends AnyRef
  23. class GraphRegistrationContext extends AnyRef

    A mutable context for registering tables, views, and flows in a dataflow graph.

  24. trait GraphValidations extends Logging

    Validations performed on a DataflowGraph.

  25. trait Input extends GraphElement

    Specifies an input that can be referenced by another Dataset's query.

  26. case class LoadTableException(name: String, cause: Option[Throwable]) extends SparkException with Product with Serializable

    Exception raised when a flow fails to read from a table defined within the pipeline

    Exception raised when a flow fails to read from a table defined within the pipeline

    name

    The name of the table

    cause

    The cause of the failure

  27. sealed trait Output extends AnyRef

    Represents a node in a DataflowGraph that can be written to by a Flow.

    Represents a node in a DataflowGraph that can be written to by a Flow. Must be backed by a file source.

  28. case class PersistedView(identifier: TableIdentifier, properties: Map[String, String], comment: Option[String], origin: QueryOrigin) extends View with Product with Serializable

    Representing a persisted View in a DataflowGraph.

    Representing a persisted View in a DataflowGraph.

    identifier

    The identifier of this view within the graph.

    properties

    Properties of the view

    comment

    when defining a view

  29. class PipelineExecution extends AnyRef

    Executes a DataflowGraph by resolving the graph, materializing datasets, and running the flows.

  30. case class PipelineTableProperty[T](key: String, default: String, fromString: (String) => T) extends Product with Serializable
  31. trait PipelineUpdateContext extends AnyRef
  32. class PipelineUpdateContextImpl extends PipelineUpdateContext

    An implementation of the PipelineUpdateContext trait used in production.

  33. case class QueryContext(currentCatalog: Option[String], currentDatabase: Option[String]) extends Product with Serializable

    Contains the catalog and database context information for query execution.

  34. case class QueryExecutionFailure(flowName: String, maxRetries: Int, cause: Option[Throwable]) extends RunFailure with Product with Serializable

    Indicates that run has failed due to a query execution failure.

  35. case class QueryOrigin(language: Option[Language] = None, filePath: Option[String] = None, sqlText: Option[String] = None, line: Option[Int] = None, startPosition: Option[Int] = None, objectType: Option[String] = None, objectName: Option[String] = None) extends Product with Serializable

    Records information used to track the provenance of a given query to user code.

    Records information used to track the provenance of a given query to user code.

    language

    The language used by the user to define the query.

    filePath

    Path to the file of the user code that defines the query.

    sqlText

    The SQL text of the query.

    line

    The line number of the query in the user code. Line numbers are 1-indexed.

    startPosition

    The start position of the query in the user code.

    objectType

    The type of the object that the query is associated with. (Table, View, etc)

    objectName

    The name of the object that the query is associated with.

  36. trait ResolutionCompletedFlow extends Flow

    A Flow whose flow function has been invoked, meaning either:

    A Flow whose flow function has been invoked, meaning either:

    • Its output schema and dependencies are known.
    • It failed to resolve.
  37. class ResolutionFailedFlow extends ResolutionCompletedFlow

    A Flow whose flow function has failed to resolve.

  38. trait ResolvedFlow extends ResolutionCompletedFlow with Input

    A Flow whose flow function has successfully resolved.

  39. case class ResolvedInput(input: Input, aliasIdentifier: AliasIdentifier) extends Product with Serializable

    A wrapper for a resolved internal input that includes the alias provided by the user.

  40. case class RunCompletion() extends RunTerminationReason with Product with Serializable

    Indicates that a triggered run has successfully completed execution.

  41. sealed abstract class RunFailure extends RunTerminationReason

    Indicates that an run entered the failed state..

  42. case class RunTerminationException(reason: RunTerminationReason) extends Exception with Product with Serializable

    Helper exception class that indicates that a run has to be terminated and tracks the associated termination reason.

  43. sealed trait RunTerminationReason extends AnyRef
  44. case class SomeTables(selectedTables: Set[TableIdentifier]) extends TableFilter with Product with Serializable

    Used in partial graph updates to select "selectedTables".

  45. case class SqlGraphElementRegistrationException(msg: String, queryOrigin: QueryOrigin) extends AnalysisException with Product with Serializable
  46. class SqlGraphRegistrationContext extends AnyRef

    SQL statement processor context.

    SQL statement processor context. At any instant, an instance of this class holds the "active" catalog/schema in use within this SQL statement processing context, and tables/views/flows that have been registered from SQL statements within this context.

  47. class SqlGraphRegistrationContextState extends AnyRef

    Data class for all state that is accumulated while processing a particular SqlGraphRegistrationContext.

  48. class StreamingFlow extends ResolvedFlow

    A Flow that represents stateful movement of data to some target.

  49. trait StreamingFlowExecution extends FlowExecution with Logging

    A 'FlowExecution' that processes data statefully using Structured Streaming.

  50. class StreamingTableWrite extends StreamingFlowExecution

    A StreamingFlowExecution that writes a streaming DataFrame to a Table.

  51. case class Table(identifier: TableIdentifier, specifiedSchema: Option[StructType], partitionCols: Option[Seq[String]], normalizedPath: Option[String], properties: Map[String, String] = Map.empty, comment: Option[String], baseOrigin: QueryOrigin, isStreamingTable: Boolean, format: Option[String]) extends TableInput with Output with Product with Serializable

    A table representing a materialized dataset in a DataflowGraph.

    A table representing a materialized dataset in a DataflowGraph.

    identifier

    The identifier of this table within the graph.

    specifiedSchema

    The user-specified schema for this table.

    partitionCols

    What columns the table should be partitioned by when materialized.

    normalizedPath

    Normalized storage location for the table based on the user-specified table path (if not defined, we will normalize a managed storage path for it).

    properties

    Table Properties to set in table metadata.

    comment

    User-specified comment that can be placed on the table.

    isStreamingTable

    if the table is a streaming table, as defined by the source code.

  52. sealed trait TableFilter extends GraphFilter[Table]

    Specifies how we should filter Tables.

  53. sealed trait TableInput extends Input

    A type of Input where data is loaded from a table.

  54. case class TemporaryView(identifier: TableIdentifier, properties: Map[String, String], comment: Option[String], origin: QueryOrigin) extends View with Product with Serializable

    Representing a temporary View in a DataflowGraph.

    Representing a temporary View in a DataflowGraph.

    identifier

    The identifier of this view within the graph.

    properties

    Properties of the view

    comment

    when defining a view

  55. case class TriggeredFailureInfo(lastFailTimestamp: Long, numFailures: Int, lastException: Throwable, lastExceptionAction: FlowExecutionAction) extends Product with Serializable
  56. class TriggeredGraphExecution extends GraphExecution

    Executes all of the flows in the given graph in topological order.

    Executes all of the flows in the given graph in topological order. Each flow processes all available data before downstream flows are triggered.

  57. class UncaughtExceptionHandler extends java.lang.Thread.UncaughtExceptionHandler

    Uncaught exception handler which first calls the delegate and then calls the OnFailure function with the uncaught exception.

  58. case class UnexpectedRunFailure() extends RunFailure with Product with Serializable

    Run could not be associated with a proper root cause.

    Run could not be associated with a proper root cause. This is not expected and likely indicates a bug.

  59. case class UnionFlowFilter(oneFilter: FlowFilter, otherFilter: FlowFilter) extends FlowFilter with Product with Serializable

    Returns a flow filter that is a union of two flow filters

  60. case class UnresolvedDatasetException(identifier: TableIdentifier) extends AnalysisException with Product with Serializable

    Exception raised when a flow tries to read from a dataset that exists but is unresolved

    Exception raised when a flow tries to read from a dataset that exists but is unresolved

    identifier

    The identifier of the dataset

  61. case class UnresolvedFlow(identifier: TableIdentifier, destinationIdentifier: TableIdentifier, func: FlowFunction, queryContext: QueryContext, sqlConf: Map[String, String], comment: Option[String] = None, once: Boolean, origin: QueryOrigin) extends Flow with Product with Serializable

    A Flow whose output schema and dependencies aren't known.

  62. case class UnresolvedPipelineException(graph: DataflowGraph, directFailures: Map[TableIdentifier, Throwable], downstreamFailures: Map[TableIdentifier, Throwable], additionalHint: Option[String] = None) extends AnalysisException with Product with Serializable

    Exception raised when a pipeline has one or more flows that cannot be resolved

    Exception raised when a pipeline has one or more flows that cannot be resolved

    directFailures

    Mapping between the name of flows that failed to resolve (due to an error in that flow) and the error that occurred when attempting to resolve them

    downstreamFailures

    Mapping between the name of flows that failed to resolve (because they failed to read from other unresolved flows) and the error that occurred when attempting to resolve them

  63. trait View extends GraphElement

    Representing a view in the DataflowGraph.

  64. case class VirtualTableInput(identifier: TableIdentifier, specifiedSchema: Option[StructType], incomingFlowIdentifiers: Set[TableIdentifier], availableFlows: Seq[ResolvedFlow] = Nil) extends TableInput with Logging with Product with Serializable

    A type of TableInput that returns data from a specified schema or from the inferred Flows that write to the table.

Value Members

  1. case object AllFlows extends FlowFilter with Product with Serializable

    Used in full graph update to select all flows.

  2. case object AllTables extends TableFilter with Product with Serializable

    Used in full graph updates to select all tables.

  3. object DataflowGraph extends Serializable
  4. object DataflowGraphTransformer
  5. object DatasetManager extends Logging

    DatasetManager is responsible for materializing tables in the catalog based on the given graph.

    DatasetManager is responsible for materializing tables in the catalog based on the given graph. For each table in the graph, it will create a table if none exists (or if this is a full refresh), or merge the schema of an existing table to match the new flows writing to it.

  6. object ExecutionResult
  7. object FlowAnalysis
  8. object FlowExecution
  9. object GraphElementTypeUtils
  10. object GraphErrors

    Collection of errors that can be thrown during graph resolution / analysis.

  11. object GraphExecution extends Logging
  12. object GraphIdentifierManager

    Responsible for properly qualify the identifiers for datasets inside or referenced by the dataflow graph.

  13. object GraphRegistrationContext
  14. object IdentifierHelper
  15. case object NoFlows extends FlowFilter with Product with Serializable

    Used to specify that no flows should be refreshed.

  16. case object NoTables extends TableFilter with Product with Serializable

    Used to select no tables.

  17. object PartitionHelper
  18. object PipelinesErrors extends Logging
  19. object PipelinesTableProperties

    Interface for validating and accessing Pipeline-specific table properties.

  20. object QueryOrigin extends Logging with Serializable
  21. object SqlGraphElementRegistrationException extends Serializable
  22. object SqlGraphRegistrationContext
  23. object TriggeredGraphExecution
  24. object UncaughtExceptionHandler
  25. object ViewHelpers

Ungrouped