Package org.apache.spark.shuffle.api
Interface ShuffleDataIO
@Private
public interface ShuffleDataIO
:: Private ::
An interface for plugging in modules for storing and reading temporary shuffle data.
This is the root of a plugin system for storing shuffle bytes to arbitrary storage
backends in the sort-based shuffle algorithm implemented by the
SortShuffleManager
. If another shuffle algorithm is
needed instead of sort-based shuffle, one should implement
ShuffleManager
instead.
A single instance of this module is loaded per process in the Spark application. The default implementation reads and writes shuffle data from the local disks of the executor, and is the implementation of shuffle file storage that has remained consistent throughout most of Spark's history.
Alternative implementations of shuffle data storage can be loaded via setting
spark.shuffle.sort.io.plugin.class
.
- Since:
- 3.0.0
-
Method Details
-
executor
ShuffleExecutorComponents executor()Called once on executor processes to bootstrap the shuffle data storage modules that are only invoked on the executors. -
driver
ShuffleDriverComponents driver()Called once on driver process to bootstrap the shuffle metadata modules that are maintained by the driver.
-