@Evolving
public interface InputPartition
extends java.io.Serializable
Batch.planInputPartitions()
and the corresponding ones in streaming .
Note that InputPartition
will be serialized and sent to executors, then
PartitionReader
will be created by
PartitionReaderFactory.createReader(InputPartition)
or
PartitionReaderFactory.createColumnarReader(InputPartition)
on executors to do
the actual reading. So InputPartition
must be serializable while PartitionReader
doesn't need to be.
Modifier and Type | Method and Description |
---|---|
default String[] |
preferredLocations()
The preferred locations where the input partition reader returned by this partition can run
faster, but Spark does not guarantee to run the input partition reader on these locations.
|
default String[] preferredLocations()
Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in the returned locations. The default return value is empty string array, which means this input partition's reader has no location preference.
If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.