InputPartition (Spark 4.0.0-preview1 JavaDoc)

All Superinterfaces:: Serializable

All Known Subinterfaces:: HasPartitionKey, HasPartitionStatistics

@Evolving public interface InputPartition extends Serializable

A serializable representation of an input partition returned by Batch.planInputPartitions() and the corresponding ones in streaming .

Note that InputPartition will be serialized and sent to executors, then PartitionReader will be created by PartitionReaderFactory.createReader(InputPartition) or PartitionReaderFactory.createColumnarReader(InputPartition) on executors to do the actual reading. So InputPartition must be serializable while PartitionReader doesn't need to be.

Since:: 3.0.0

Method Summary

Modifier and Type

Method

Description

default String[]

preferredLocations()

The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations.

Method Details
- preferredLocations
  
  default String[] preferredLocations()
  
  The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations. The implementations should make sure that it can be run on any location. The location is a string representing the host name.
  Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in the returned locations. The default return value is empty string array, which means this input partition's reader has no location preference.
  If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.