Package org.apache.spark.sql
Class SparkSessionExtensions
Object
org.apache.spark.sql.SparkSessionExtensions
:: Experimental ::
Holder for injection points to the
SparkSession
. We make NO guarantee about the stability
regarding binary compatibility and source compatibility of methods here.
This current provides the following extension points:
- Analyzer Rules.
- Check Analysis Rules.
- Cache Plan Normalization Rules.
- Optimizer Rules.
- Pre CBO Rules.
- Planning Strategies.
- Customized Parser.
- (External) Catalog listeners.
- Columnar Rules.
- Adaptive Query Post Planner Strategy Rules.
- Adaptive Query Stage Preparation Rules.
- Adaptive Query Execution Runtime Optimizer Rules.
- Adaptive Query Stage Optimizer Rules.
The extensions can be used by calling withExtensions
on the SparkSession.Builder
, for
example:
SparkSession.builder()
.master("...")
.config("...", true)
.withExtensions { extensions =>
extensions.injectResolutionRule { session =>
...
}
extensions.injectParser { (session, parser) =>
...
}
}
.getOrCreate()
The extensions can also be used by setting the Spark SQL configuration property
spark.sql.extensions
. Multiple extensions can be set using a comma-separated list. For example:
SparkSession.builder()
.master("...")
.config("spark.sql.extensions", "org.example.MyExtensions,org.example.YourExtensions")
.getOrCreate()
class MyExtensions extends Function1[SparkSessionExtensions, Unit] {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectResolutionRule { session =>
...
}
extensions.injectParser { (session, parser) =>
...
}
}
}
class YourExtensions extends SparkSessionExtensionsProvider {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectResolutionRule { session =>
...
}
extensions.injectFunction(...)
}
}
Note that none of the injected builders should assume that the SparkSession
is fully
initialized and should not touch the session's internals (e.g. the SessionState).
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionscala.collection.immutable.Seq<org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>>
buildPlanNormalizationRules
(SparkSession session) void
injectCheckRule
(scala.Function1<SparkSession, scala.Function1<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan, scala.runtime.BoxedUnit>> builder) Inject an check analysisRule
builder into theSparkSession
.void
injectColumnar
(scala.Function1<SparkSession, org.apache.spark.sql.execution.ColumnarRule> builder) Inject a rule that can override the columnar execution of an executor.void
injectFunction
(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier, org.apache.spark.sql.catalyst.expressions.ExpressionInfo, scala.Function1<scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.Expression>, org.apache.spark.sql.catalyst.expressions.Expression>> functionDescription) Injects a custom function into theFunctionRegistry
at runtime for all sessions.void
injectOptimizerRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an optimizerRule
builder into theSparkSession
.void
injectParser
(scala.Function2<SparkSession, org.apache.spark.sql.catalyst.parser.ParserInterface, org.apache.spark.sql.catalyst.parser.ParserInterface> builder) Inject a custom parser into theSparkSession
.void
injectPlannerStrategy
(scala.Function1<SparkSession, org.apache.spark.sql.execution.SparkStrategy> builder) Inject a plannerStrategy
builder into theSparkSession
.void
injectPlanNormalizationRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject a plan normalizationRule
builder into theSparkSession
.void
injectPostHocResolutionRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an analyzerRule
builder into theSparkSession
.void
injectPreCBORule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an optimizerRule
builder that rewrites logical plans into theSparkSession
.void
injectQueryPostPlannerStrategyRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder) Inject a rule that applied betweenplannerStrategy
andqueryStagePrepRules
, so it can get the whole plan before injecting exchanges.void
injectQueryStageOptimizerRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder) Inject a rule that can override the query stage optimizer phase of adaptive query execution.void
injectQueryStagePrepRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder) Inject a rule that can override the query stage preparation phase of adaptive query execution.void
injectResolutionRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an analyzer resolutionRule
builder into theSparkSession
.void
injectRuntimeOptimizerRule
(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject a runtimeRule
builder into theSparkSession
.void
injectTableFunction
(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier, org.apache.spark.sql.catalyst.expressions.ExpressionInfo, scala.Function1<scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.Expression>, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> functionDescription) Injects a custom function into theTableFunctionRegistry
at runtime for all sessions.
-
Constructor Details
-
SparkSessionExtensions
public SparkSessionExtensions()
-
-
Method Details
-
buildPlanNormalizationRules
public scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> buildPlanNormalizationRules(SparkSession session) -
injectCheckRule
public void injectCheckRule(scala.Function1<SparkSession, scala.Function1<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan, scala.runtime.BoxedUnit>> builder) Inject an check analysisRule
builder into theSparkSession
. The injected rules will be executed after the analysis phase. A check analysis rule is used to detect problems with a LogicalPlan and should throw an exception when a problem is found.- Parameters:
builder
- (undocumented)
-
injectColumnar
public void injectColumnar(scala.Function1<SparkSession, org.apache.spark.sql.execution.ColumnarRule> builder) Inject a rule that can override the columnar execution of an executor.- Parameters:
builder
- (undocumented)
-
injectFunction
public void injectFunction(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier, org.apache.spark.sql.catalyst.expressions.ExpressionInfo, scala.Function1<scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.Expression>, org.apache.spark.sql.catalyst.expressions.Expression>> functionDescription) Injects a custom function into theFunctionRegistry
at runtime for all sessions.- Parameters:
functionDescription
- (undocumented)
-
injectOptimizerRule
public void injectOptimizerRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an optimizerRule
builder into theSparkSession
. The injected rules will be executed during the operator optimization batch. An optimizer rule is used to improve the quality of an analyzed logical plan; these rules should never modify the result of the LogicalPlan.- Parameters:
builder
- (undocumented)
-
injectParser
public void injectParser(scala.Function2<SparkSession, org.apache.spark.sql.catalyst.parser.ParserInterface, org.apache.spark.sql.catalyst.parser.ParserInterface> builder) Inject a custom parser into theSparkSession
. Note that the builder is passed a session and an initial parser. The latter allows for a user to create a partial parser and to delegate to the underlying parser for completeness. If a user injects more parsers, then the parsers are stacked on top of each other.- Parameters:
builder
- (undocumented)
-
injectPlanNormalizationRule
public void injectPlanNormalizationRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject a plan normalizationRule
builder into theSparkSession
. The injected rules will be executed just before query caching decisions are made. Such rules can be used to improve the cache hit rate by normalizing different plans to the same form. These rules should never modify the result of the LogicalPlan.- Parameters:
builder
- (undocumented)
-
injectPlannerStrategy
public void injectPlannerStrategy(scala.Function1<SparkSession, org.apache.spark.sql.execution.SparkStrategy> builder) Inject a plannerStrategy
builder into theSparkSession
. The injected strategy will be used to convert aLogicalPlan
into a executableSparkPlan
.- Parameters:
builder
- (undocumented)
-
injectPostHocResolutionRule
public void injectPostHocResolutionRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an analyzerRule
builder into theSparkSession
. These analyzer rules will be executed after resolution.- Parameters:
builder
- (undocumented)
-
injectPreCBORule
public void injectPreCBORule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an optimizerRule
builder that rewrites logical plans into theSparkSession
. The injected rules will be executed once after the operator optimization batch and before any cost-based optimization rules that depend on stats.- Parameters:
builder
- (undocumented)
-
injectQueryPostPlannerStrategyRule
public void injectQueryPostPlannerStrategyRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder) Inject a rule that applied betweenplannerStrategy
andqueryStagePrepRules
, so it can get the whole plan before injecting exchanges. Note, these rules can only be applied within AQE.- Parameters:
builder
- (undocumented)
-
injectQueryStageOptimizerRule
public void injectQueryStageOptimizerRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder) Inject a rule that can override the query stage optimizer phase of adaptive query execution.- Parameters:
builder
- (undocumented)
-
injectQueryStagePrepRule
public void injectQueryStagePrepRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder) Inject a rule that can override the query stage preparation phase of adaptive query execution.- Parameters:
builder
- (undocumented)
-
injectResolutionRule
public void injectResolutionRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject an analyzer resolutionRule
builder into theSparkSession
. These analyzer rules will be executed as part of the resolution phase of analysis.- Parameters:
builder
- (undocumented)
-
injectRuntimeOptimizerRule
public void injectRuntimeOptimizerRule(scala.Function1<SparkSession, org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder) Inject a runtimeRule
builder into theSparkSession
. The injected rules will be executed after built-inAQEOptimizer
rules are applied. A runtime optimizer rule is used to improve the quality of a logical plan during execution which can leverage accurate statistics from shuffle.Note that, it does not work if adaptive query execution is disabled.
- Parameters:
builder
- (undocumented)
-
injectTableFunction
public void injectTableFunction(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier, org.apache.spark.sql.catalyst.expressions.ExpressionInfo, scala.Function1<scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.Expression>, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> functionDescription) Injects a custom function into theTableFunctionRegistry
at runtime for all sessions.- Parameters:
functionDescription
- (undocumented)
-