Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package connector
    Definition Classes
    sql
  • package write
    Definition Classes
    connector
  • package streaming
    Definition Classes
    write
  • StreamingDataWriterFactory
  • StreamingWrite

package streaming

Type Members

  1. trait StreamingDataWriterFactory extends Serializable

    A factory of DataWriter returned by StreamingWrite#createStreamingWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.

    A factory of DataWriter returned by StreamingWrite#createStreamingWriterFactory(PhysicalWriteInfo), which is responsible for creating and initializing the actual data writer at executor side.

    Note that, the writer factory will be serialized and sent to executors, then the data writer will be created on executors and do the actual writing. So this interface must be serializable and DataWriter doesn't need to be.

    Annotations
    @Evolving()
    Since

    3.0.0

  2. trait StreamingWrite extends AnyRef

    An interface that defines how to write the data to data source in streaming queries.

    An interface that defines how to write the data to data source in streaming queries.

    The writing procedure is:

    • Create a writer factory by #createStreamingWriterFactory(PhysicalWriteInfo), serialize and send it to all the partitions of the input data(RDD).
    • For each epoch in each partition, create the data writer, and write the data of the epoch in the partition with this writer. If all the data are written successfully, call DataWriter#commit(). If exception happens during the writing, call DataWriter#abort().
    • If writers in all partitions of one epoch are successfully committed, call WriterCommitMessage[]). If some writers are aborted, or the job failed with an unknown reason, call WriterCommitMessage[]).

    While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should do it manually in their Spark applications if they want to retry.

    Please refer to the documentation of commit/abort methods for detailed specifications.

    Annotations
    @Evolving()
    Since

    3.0.0

Ungrouped