Package

org.apache.spark.internal

io

Permalink

package io

Visibility
  1. Public
  2. All

Type Members

  1. abstract class FileCommitProtocol extends AnyRef

    Permalink

    An interface to define how a single Spark job commits its outputs.

    An interface to define how a single Spark job commits its outputs. Two notes:

    1. Implementations must be serializable, as the committer instance instantiated on the driver will be used for tasks on executors. 2. Implementations should have a constructor with either 2 or 3 arguments: (jobId: String, path: String) or (jobId: String, path: String, isAppend: Boolean). 3. A committer should not be reused across multiple Spark jobs.

    The proper call sequence is:

    1. Driver calls setupJob. 2. As part of each task's execution, executor calls setupTask and then commitTask (or abortTask if task failed). 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job failed to execute (e.g. too many failed tasks), the job should call abortJob.

  2. class HadoopMapReduceCommitProtocol extends FileCommitProtocol with Serializable with Logging

    Permalink

    An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).

    An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).

    Unlike Hadoop's OutputCommitter, this implementation is serializable.

Value Members

  1. object FileCommitProtocol

    Permalink

Ungrouped