public class SparkSessionExtensions
extends Object
SparkSession
. We make NO guarantee about the stability
regarding binary compatibility and source compatibility of methods here.
This current provides the following extension points:
The extensions can be used by calling withExtensions
on the SparkSession.Builder
, for
example:
SparkSession.builder()
.master("...")
.config("...", true)
.withExtensions { extensions =>
extensions.injectResolutionRule { session =>
...
}
extensions.injectParser { (session, parser) =>
...
}
}
.getOrCreate()
The extensions can also be used by setting the Spark SQL configuration property
spark.sql.extensions
. Multiple extensions can be set using a comma-separated list. For example:
SparkSession.builder()
.master("...")
.config("spark.sql.extensions", "org.example.MyExtensions,org.example.YourExtensions")
.getOrCreate()
class MyExtensions extends Function1[SparkSessionExtensions, Unit] {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectResolutionRule { session =>
...
}
extensions.injectParser { (session, parser) =>
...
}
}
}
class YourExtensions extends SparkSessionExtensionsProvider {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectResolutionRule { session =>
...
}
extensions.injectFunction(...)
}
}
Note that none of the injected builders should assume that the SparkSession
is fully
initialized and should not touch the session's internals (e.g. the SessionState).
Constructor and Description |
---|
SparkSessionExtensions() |
Modifier and Type | Method and Description |
---|---|
scala.collection.Seq<org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> |
buildPlanNormalizationRules(SparkSession session) |
void |
injectCheckRule(scala.Function1<SparkSession,scala.Function1<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan,scala.runtime.BoxedUnit>> builder)
Inject an check analysis
Rule builder into the SparkSession . |
void |
injectColumnar(scala.Function1<SparkSession,org.apache.spark.sql.execution.ColumnarRule> builder)
Inject a rule that can override the columnar execution of an executor.
|
void |
injectFunction(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier,org.apache.spark.sql.catalyst.expressions.ExpressionInfo,scala.Function1<scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression>,org.apache.spark.sql.catalyst.expressions.Expression>> functionDescription)
Injects a custom function into the
FunctionRegistry
at runtime for all sessions. |
void |
injectOptimizerRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Inject an optimizer
Rule builder into the SparkSession . |
void |
injectParser(scala.Function2<SparkSession,org.apache.spark.sql.catalyst.parser.ParserInterface,org.apache.spark.sql.catalyst.parser.ParserInterface> builder)
Inject a custom parser into the
SparkSession . |
void |
injectPlannerStrategy(scala.Function1<SparkSession,org.apache.spark.sql.execution.SparkStrategy> builder)
Inject a planner
Strategy builder into the SparkSession . |
void |
injectPlanNormalizationRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Inject a plan normalization
Rule builder into the SparkSession . |
void |
injectPostHocResolutionRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Inject an analyzer
Rule builder into the SparkSession . |
void |
injectPreCBORule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Inject an optimizer
Rule builder that rewrites logical plans into the SparkSession . |
void |
injectQueryPostPlannerStrategyRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder)
Inject a rule that applied between
plannerStrategy and queryStagePrepRules , so
it can get the whole plan before injecting exchanges. |
void |
injectQueryStageOptimizerRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder)
Inject a rule that can override the query stage optimizer phase of adaptive query
execution.
|
void |
injectQueryStagePrepRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder)
Inject a rule that can override the query stage preparation phase of adaptive query
execution.
|
void |
injectResolutionRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Inject an analyzer resolution
Rule builder into the SparkSession . |
void |
injectRuntimeOptimizerRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Inject a runtime
Rule builder into the SparkSession . |
void |
injectTableFunction(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier,org.apache.spark.sql.catalyst.expressions.ExpressionInfo,scala.Function1<scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression>,org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> functionDescription)
Injects a custom function into the
TableFunctionRegistry at runtime for all sessions. |
public scala.collection.Seq<org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> buildPlanNormalizationRules(SparkSession session)
public void injectCheckRule(scala.Function1<SparkSession,scala.Function1<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan,scala.runtime.BoxedUnit>> builder)
Rule
builder into the SparkSession
. The injected rules will
be executed after the analysis phase. A check analysis rule is used to detect problems with a
LogicalPlan and should throw an exception when a problem is found.builder
- (undocumented)public void injectColumnar(scala.Function1<SparkSession,org.apache.spark.sql.execution.ColumnarRule> builder)
builder
- (undocumented)public void injectFunction(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier,org.apache.spark.sql.catalyst.expressions.ExpressionInfo,scala.Function1<scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression>,org.apache.spark.sql.catalyst.expressions.Expression>> functionDescription)
FunctionRegistry
at runtime for all sessions.functionDescription
- (undocumented)public void injectOptimizerRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Rule
builder into the SparkSession
. The injected rules will be
executed during the operator optimization batch. An optimizer rule is used to improve the
quality of an analyzed logical plan; these rules should never modify the result of the
LogicalPlan.builder
- (undocumented)public void injectParser(scala.Function2<SparkSession,org.apache.spark.sql.catalyst.parser.ParserInterface,org.apache.spark.sql.catalyst.parser.ParserInterface> builder)
SparkSession
. Note that the builder is passed a session
and an initial parser. The latter allows for a user to create a partial parser and to delegate
to the underlying parser for completeness. If a user injects more parsers, then the parsers
are stacked on top of each other.builder
- (undocumented)public void injectPlanNormalizationRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Rule
builder into the SparkSession
. The injected rules will
be executed just before query caching decisions are made. Such rules can be used to improve the
cache hit rate by normalizing different plans to the same form. These rules should never modify
the result of the LogicalPlan.builder
- (undocumented)public void injectPlannerStrategy(scala.Function1<SparkSession,org.apache.spark.sql.execution.SparkStrategy> builder)
Strategy
builder into the SparkSession
. The injected strategy will
be used to convert a LogicalPlan
into a executable
SparkPlan
.builder
- (undocumented)public void injectPostHocResolutionRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Rule
builder into the SparkSession
. These analyzer
rules will be executed after resolution.builder
- (undocumented)public void injectPreCBORule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Rule
builder that rewrites logical plans into the SparkSession
.
The injected rules will be executed once after the operator optimization batch and
before any cost-based optimization rules that depend on stats.builder
- (undocumented)public void injectQueryPostPlannerStrategyRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder)
plannerStrategy
and queryStagePrepRules
, so
it can get the whole plan before injecting exchanges.
Note, these rules can only be applied within AQE.builder
- (undocumented)public void injectQueryStageOptimizerRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder)
builder
- (undocumented)public void injectQueryStagePrepRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.execution.SparkPlan>> builder)
builder
- (undocumented)public void injectResolutionRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Rule
builder into the SparkSession
. These analyzer
rules will be executed as part of the resolution phase of analysis.builder
- (undocumented)public void injectRuntimeOptimizerRule(scala.Function1<SparkSession,org.apache.spark.sql.catalyst.rules.Rule<org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> builder)
Rule
builder into the SparkSession
.
The injected rules will be executed after built-in
org.apache.spark.sql.execution.adaptive.AQEOptimizer
rules are applied.
A runtime optimizer rule is used to improve the quality of a logical plan during execution
which can leverage accurate statistics from shuffle.
Note that, it does not work if adaptive query execution is disabled.
builder
- (undocumented)public void injectTableFunction(scala.Tuple3<org.apache.spark.sql.catalyst.FunctionIdentifier,org.apache.spark.sql.catalyst.expressions.ExpressionInfo,scala.Function1<scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression>,org.apache.spark.sql.catalyst.plans.logical.LogicalPlan>> functionDescription)
TableFunctionRegistry
at runtime for all sessions.functionDescription
- (undocumented)