Interface ShuffleDataIO


@Private public interface ShuffleDataIO
:: Private :: An interface for plugging in modules for storing and reading temporary shuffle data.

This is the root of a plugin system for storing shuffle bytes to arbitrary storage backends in the sort-based shuffle algorithm implemented by the SortShuffleManager. If another shuffle algorithm is needed instead of sort-based shuffle, one should implement ShuffleManager instead.

A single instance of this module is loaded per process in the Spark application. The default implementation reads and writes shuffle data from the local disks of the executor, and is the implementation of shuffle file storage that has remained consistent throughout most of Spark's history.

Alternative implementations of shuffle data storage can be loaded via setting spark.shuffle.sort.io.plugin.class.

Since:
3.0.0
  • Method Summary

    Modifier and Type
    Method
    Description
    Called once on driver process to bootstrap the shuffle metadata modules that are maintained by the driver.
    Called once on executor processes to bootstrap the shuffle data storage modules that are only invoked on the executors.
  • Method Details

    • executor

      Called once on executor processes to bootstrap the shuffle data storage modules that are only invoked on the executors.
    • driver

      Called once on driver process to bootstrap the shuffle metadata modules that are maintained by the driver.