Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package connector
    Definition Classes
    sql
  • package write
    Definition Classes
    connector
  • package streaming
    Definition Classes
    write
  • StreamingDataWriterFactory
  • StreamingWrite

trait StreamingWrite extends AnyRef

An interface that defines how to write the data to data source in streaming queries.

The writing procedure is:

  • Create a writer factory by #createStreamingWriterFactory(PhysicalWriteInfo), serialize and send it to all the partitions of the input data(RDD).
  • For each epoch in each partition, create the data writer, and write the data of the epoch in the partition with this writer. If all the data are written successfully, call DataWriter#commit(). If exception happens during the writing, call DataWriter#abort().
  • If writers in all partitions of one epoch are successfully committed, call WriterCommitMessage[]). If some writers are aborted, or the job failed with an unknown reason, call WriterCommitMessage[]).

While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should do it manually in their Spark applications if they want to retry.

Please refer to the documentation of commit/abort methods for detailed specifications.

Annotations
@Evolving()
Source
StreamingWrite.java
Since

3.0.0

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StreamingWrite
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def abort(epochId: Long, messages: Array[WriterCommitMessage]): Unit

    Aborts this writing job because some data writers are failed and keep failing when retried, or the Spark job fails with some unknown reasons, or WriterCommitMessage[]) fails.

    Aborts this writing job because some data writers are failed and keep failing when retried, or the Spark job fails with some unknown reasons, or WriterCommitMessage[]) fails.

    If this method fails (by throwing an exception), the underlying data source may require manual cleanup.

    Unless the abort is triggered by the failure of commit, the given messages will have some null slots, as there may be only a few data writers that were committed before the abort happens, or some data writers were committed but their commit messages haven't reached the driver when the abort is triggered. So this is just a "best effort" for data sources to clean up the data left by data writers.

  2. abstract def commit(epochId: Long, messages: Array[WriterCommitMessage]): Unit

    Commits this writing job for the specified epoch with a list of commit messages.

    Commits this writing job for the specified epoch with a list of commit messages. The commit messages are collected from successful data writers and are produced by DataWriter#commit().

    If this method fails (by throwing an exception), this writing job is considered to have been failed, and the execution engine will attempt to call WriterCommitMessage[]).

    The execution engine may call commit multiple times for the same epoch in some circumstances. To support exactly-once data semantics, implementations must ensure that multiple commits for the same epoch are idempotent.

  3. abstract def createStreamingWriterFactory(info: PhysicalWriteInfo): StreamingDataWriterFactory

    Creates a writer factory which will be serialized and sent to executors.

    Creates a writer factory which will be serialized and sent to executors.

    If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.

    info

    Information about the RDD that will be written to this data writer

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  14. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  15. def toString(): String
    Definition Classes
    AnyRef → Any
  16. def useCommitCoordinator(): Boolean

    Returns whether Spark should use the commit coordinator to ensure that at most one task for each partition commits.

    Returns whether Spark should use the commit coordinator to ensure that at most one task for each partition commits.

    returns

    true if commit coordinator should be used, false otherwise.

  17. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  18. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  19. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped