Packages

trait HasPartitionKey extends InputPartition

A mix-in for input partitions whose records are clustered on the same set of partition keys (provided via SupportsReportPartitioning, see below). Data sources can opt-in to implement this interface for the partitions they report to Spark, which will use the information to avoid data shuffling in certain scenarios, such as join, aggregate, etc. Note that Spark requires ALL input partitions to implement this interface, otherwise it can't take advantage of it.

This interface should be used in combination with SupportsReportPartitioning, which allows data sources to report distribution and ordering spec to Spark. In particular, Spark expects data sources to report org.apache.spark.sql.connector.distributions.ClusteredDistribution whenever its input partitions implement this interface. Spark derives partition keys spec (e.g., column names, transforms) from the distribution, and partition values from the input partitions.

It is implementor's responsibility to ensure that when an input partition implements this interface, its records all have the same value for the partition keys. Spark doesn't check this property.

Source
HasPartitionKey.java
Since

3.3.0

See also

org.apache.spark.sql.connector.read.SupportsReportPartitioning

org.apache.spark.sql.connector.read.partitioning.Partitioning

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HasPartitionKey
  2. InputPartition
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def partitionKey(): InternalRow

    Returns the value of the partition key(s) associated to this partition.

    Returns the value of the partition key(s) associated to this partition. An input partition implementing this interface needs to ensure that all its records have the same value for the partition keys. Note that the value is after partition transform has been applied, if there is any.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  14. def preferredLocations(): Array[String]

    The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations.

    The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations. The implementations should make sure that it can be run on any location. The location is a string representing the host name.

    Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in the returned locations. The default return value is empty string array, which means this input partition's reader has no location preference.

    If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.

    Definition Classes
    InputPartition
  15. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  16. def toString(): String
    Definition Classes
    AnyRef → Any
  17. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  18. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  19. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from InputPartition

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped