trait HasPartitionKey extends InputPartition
A mix-in for input partitions whose records are clustered on the same set of partition keys
(provided via SupportsReportPartitioning
, see below). Data sources can opt-in to
implement this interface for the partitions they report to Spark, which will use the
information to avoid data shuffling in certain scenarios, such as join, aggregate, etc. Note
that Spark requires ALL input partitions to implement this interface, otherwise it can't take
advantage of it.
This interface should be used in combination with SupportsReportPartitioning
, which
allows data sources to report distribution and ordering spec to Spark. In particular, Spark
expects data sources to report
org.apache.spark.sql.connector.distributions.ClusteredDistribution
whenever its input
partitions implement this interface. Spark derives partition keys spec (e.g., column names,
transforms) from the distribution, and partition values from the input partitions.
It is implementor's responsibility to ensure that when an input partition implements this interface, its records all have the same value for the partition keys. Spark doesn't check this property.
- Source
- HasPartitionKey.java
- Since
3.3.0
- See also
org.apache.spark.sql.connector.read.SupportsReportPartitioning
org.apache.spark.sql.connector.read.partitioning.Partitioning
- Alphabetic
- By Inheritance
- HasPartitionKey
- InputPartition
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def partitionKey(): InternalRow
Returns the value of the partition key(s) associated to this partition.
Returns the value of the partition key(s) associated to this partition. An input partition implementing this interface needs to ensure that all its records have the same value for the partition keys. Note that the value is after partition transform has been applied, if there is any.
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def preferredLocations(): Array[String]
The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations.
The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations. The implementations should make sure that it can be run on any location. The location is a string representing the host name.
Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in the returned locations. The default return value is empty string array, which means this input partition's reader has no location preference.
If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.
- Definition Classes
- InputPartition
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)