Packages

t

org.apache.spark.sql.sources

SupportsStreamSourceMetadataColumns

trait SupportsStreamSourceMetadataColumns extends StreamSourceProvider

Implemented by StreamSourceProvider objects that can generate file metadata columns. This trait extends the basic StreamSourceProvider by allowing the addition of metadata columns to the schema of the Stream Data Source.

Source
interfaces.scala
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SupportsStreamSourceMetadataColumns
  2. StreamSourceProvider
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def createSource(sqlContext: SQLContext, metadataPath: String, schema: Option[StructType], providerName: String, parameters: Map[String, String]): Source

    Definition Classes
    StreamSourceProvider
    Since

    2.0.0

  2. abstract def getMetadataOutput(spark: SparkSession, options: Map[String, String], userSpecifiedSchema: Option[StructType]): Seq[AttributeReference]

    Returns the metadata columns that should be added to the schema of the Stream Source.

    Returns the metadata columns that should be added to the schema of the Stream Source. These metadata columns supplement the columns defined in the sourceSchema() of the StreamSourceProvider.

    The final schema for the Stream Source, therefore, consists of the source schema as defined by StreamSourceProvider.sourceSchema(), with the metadata columns added at the end. The caller is responsible for resolving any naming conflicts with the source schema.

    An example of using this streaming source metadata output interface is when a customized file-based streaming source needs to expose file metadata columns, leveraging the hidden file metadata columns from its underlying storage format.

    spark

    The SparkSession used for the operation.

    options

    A map of options of the Stream Data Source.

    userSpecifiedSchema

    An optional user-provided schema of the Stream Data Source.

    returns

    A Seq of AttributeReference representing the metadata output attributes.

  3. abstract def sourceSchema(sqlContext: SQLContext, schema: Option[StructType], providerName: String, parameters: Map[String, String]): (String, StructType)

    Returns the name and schema of the source that can be used to continually read data.

    Returns the name and schema of the source that can be used to continually read data.

    Definition Classes
    StreamSourceProvider
    Since

    2.0.0