public class SparkStrategies.HashJoin extends org.apache.spark.sql.catalyst.planning.GenericStrategy<SparkPlan> implements org.apache.spark.sql.catalyst.expressions.PredicateHelper
Constructor and Description |
---|
SparkStrategies.HashJoin()
Uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be
evaluated by matching hash keys.
|
Modifier and Type | Method and Description |
---|---|
scala.collection.Seq<SparkPlan> |
apply(org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan) |
isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$Logging$$log__$eq, org$apache$spark$Logging$$log_
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
canEvaluate, splitConjunctivePredicates
initializeIfNecessary, initializeLogging, log_
public SparkStrategies.HashJoin()
This strategy applies a simple optimization based on the estimates of the physical sizes of
the two join sides. When planning a BroadcastHashJoin
, if one side has an
estimated physical size smaller than the user-settable threshold
org.apache.spark.sql.SQLConf.AUTO_BROADCASTJOIN_THRESHOLD
, the planner would mark it as the
''build'' relation and mark the other relation as the ''stream'' side. The build table will be
''broadcasted'' to all of the executors involved in the join, as a
Broadcast
object. If both estimates exceed the threshold, they
will instead be used to decide the build side in a ShuffledHashJoin
.