@Private
public interface ShuffleDataIO
This is the root of a plugin system for storing shuffle bytes to arbitrary storage
backends in the sort-based shuffle algorithm implemented by the
org.apache.spark.shuffle.sort.SortShuffleManager
. If another shuffle algorithm is
needed instead of sort-based shuffle, one should implement
ShuffleManager
instead.
A single instance of this module is loaded per process in the Spark application. The default implementation reads and writes shuffle data from the local disks of the executor, and is the implementation of shuffle file storage that has remained consistent throughout most of Spark's history.
Alternative implementations of shuffle data storage can be loaded via setting
spark.shuffle.sort.io.plugin.class
.
Modifier and Type | Method and Description |
---|---|
ShuffleDriverComponents |
driver()
Called once on driver process to bootstrap the shuffle metadata modules that
are maintained by the driver.
|
ShuffleExecutorComponents |
executor()
Called once on executor processes to bootstrap the shuffle data storage modules that
are only invoked on the executors.
|
ShuffleExecutorComponents executor()
ShuffleDriverComponents driver()