Interface SupportsStreamSourceMetadataColumns

All Superinterfaces:
StreamSourceProvider

public interface SupportsStreamSourceMetadataColumns extends StreamSourceProvider
Implemented by StreamSourceProvider objects that can generate file metadata columns. This trait extends the basic StreamSourceProvider by allowing the addition of metadata columns to the schema of the Stream Data Source.
  • Method Summary

    Modifier and Type
    Method
    Description
    scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.AttributeReference>
    getMetadataOutput(SparkSession spark, scala.collection.immutable.Map<String,String> options, scala.Option<StructType> userSpecifiedSchema)
    Returns the metadata columns that should be added to the schema of the Stream Source.

    Methods inherited from interface org.apache.spark.sql.sources.StreamSourceProvider

    createSource, sourceSchema
  • Method Details

    • getMetadataOutput

      scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.AttributeReference> getMetadataOutput(SparkSession spark, scala.collection.immutable.Map<String,String> options, scala.Option<StructType> userSpecifiedSchema)
      Returns the metadata columns that should be added to the schema of the Stream Source. These metadata columns supplement the columns defined in the sourceSchema() of the StreamSourceProvider.

      The final schema for the Stream Source, therefore, consists of the source schema as defined by StreamSourceProvider.sourceSchema(), with the metadata columns added at the end. The caller is responsible for resolving any naming conflicts with the source schema.

      An example of using this streaming source metadata output interface is when a customized file-based streaming source needs to expose file metadata columns, leveraging the hidden file metadata columns from its underlying storage format.

      Parameters:
      spark - The SparkSession used for the operation.
      options - A map of options of the Stream Data Source.
      userSpecifiedSchema - An optional user-provided schema of the Stream Data Source.
      Returns:
      A Seq of AttributeReference representing the metadata output attributes.