Class

org.apache.spark.internal.io

HadoopMapReduceCommitProtocol

Related Doc: package io

Permalink

class HadoopMapReduceCommitProtocol extends FileCommitProtocol with Serializable with Logging

An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).

Unlike Hadoop's OutputCommitter, this implementation is serializable.

Source
HadoopMapReduceCommitProtocol.scala
Linear Supertypes
Logging, Serializable, Serializable, FileCommitProtocol, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HadoopMapReduceCommitProtocol
  2. Logging
  3. Serializable
  4. Serializable
  5. FileCommitProtocol
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new HadoopMapReduceCommitProtocol(jobId: String, path: String, dynamicPartitionOverwrite: Boolean = false)

    Permalink

    jobId

    the job's or stage's id

    path

    the job's output path, or null if committer acts as a noop

    dynamicPartitionOverwrite

    If true, Spark will overwrite partition directories at runtime dynamically, i.e., we first write files under a staging directory with partition path, e.g. /path/to/staging/a=1/b=1/xxx.parquet. When committing the job, we first clean up the corresponding partition directories at destination path, e.g. /path/to/destination/a=1/b=1, and move files from staging directory to the corresponding partition directories under destination path.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def abortJob(jobContext: JobContext): Unit

    Permalink

    Aborts a job after the writes fail.

    Aborts a job after the writes fail. Must be called on the driver.

    Calling this function is a best-effort attempt, because it is possible that the driver just crashes (or killed) before it can call abort.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  5. def abortTask(taskContext: TaskAttemptContext): Unit

    Permalink

    Aborts a task after the writes have failed.

    Aborts a task after the writes have failed. Must be called on the executors when running tasks.

    Calling this function is a best-effort attempt, because it is possible that the executor just crashes (or killed) before it can call abort.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit

    Permalink

    Commits a job after the writes succeed.

    Commits a job after the writes succeed. Must be called on the driver.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  9. def commitTask(taskContext: TaskAttemptContext): TaskCommitMessage

    Permalink

    Commits a task after the writes succeed.

    Commits a task after the writes succeed. Must be called on the executors when running tasks.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  10. def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean

    Permalink

    Specifies that a file should be deleted with the commit of this job.

    Specifies that a file should be deleted with the commit of this job. The default implementation deletes the file immediately.

    Definition Classes
    FileCommitProtocol
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  16. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  17. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  19. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  33. def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String

    Permalink

    Notifies the commit protocol to add a new file, and gets back the full path that should be used.

    Notifies the commit protocol to add a new file, and gets back the full path that should be used. Must be called on the executors when running tasks.

    Note that the returned temp file may have an arbitrary path. The commit protocol only promises that the file will be at the location specified by the arguments after job commit.

    A full file path consists of the following parts:

    1. the base path 2. some sub-directory within the base path, used to specify partitioning 3. file prefix, usually some unique job id with the task id 4. bucket id 5. source specific file extension, e.g. ".snappy.parquet"

    The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest are left to the commit protocol implementation to decide.

    Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  34. def newTaskTempFileAbsPath(taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String

    Permalink

    Similar to newTaskTempFile(), but allows files to committed to an absolute output location.

    Similar to newTaskTempFile(), but allows files to committed to an absolute output location. Depending on the implementation, there may be weaker guarantees around adding files this way.

    Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  35. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  36. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  37. def onTaskCommit(taskCommit: TaskCommitMessage): Unit

    Permalink

    Called on the driver after a task commits.

    Called on the driver after a task commits. This can be used to access task commit messages before the job has finished. These same task commit messages will be passed to commitJob() if the entire job succeeds.

    Definition Classes
    FileCommitProtocol
  38. def setupCommitter(context: TaskAttemptContext): OutputCommitter

    Permalink
    Attributes
    protected
  39. def setupJob(jobContext: JobContext): Unit

    Permalink

    Setups up a job.

    Setups up a job. Must be called on the driver before any other methods can be invoked.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  40. def setupTask(taskContext: TaskAttemptContext): Unit

    Permalink

    Sets up a task within a job.

    Sets up a task within a job. Must be called before any other task related methods can be invoked.

    Definition Classes
    HadoopMapReduceCommitProtocolFileCommitProtocol
  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  43. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from FileCommitProtocol

Inherited from AnyRef

Inherited from Any

Ungrouped