sources

package sources

A set of APIs for adding data sources to Spark SQL.

Source: package.scala

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

sources
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Type Members

case class AlwaysFalse() extends Filter with Product with Serializable
A filter that always evaluates to false.
A filter that always evaluates to false.
Annotations
@Evolving()
Since
3.0.0
case class AlwaysTrue() extends Filter with Product with Serializable
A filter that always evaluates to true.
A filter that always evaluates to true.
Annotations
@Evolving()
Since
3.0.0
case class And(left: Filter, right: Filter) extends Filter with Product with Serializable
A filter that evaluates to true iff both left or right evaluate to true.
A filter that evaluates to true iff both left or right evaluate to true.
Annotations
@Stable()
Since
1.3.0
abstract class BaseRelation extends AnyRef
Represents a collection of tuples with a known schema.
Represents a collection of tuples with a known schema. Classes that extend BaseRelation must be able to produce the schema of their data in the form of a StructType. Concrete implementation should inherit from one of the descendant Scan classes, which define various abstract methods for execution.
BaseRelations must also define an equality function that only returns true when the two instances will return the same data. This equality function is used when determining when it is safe to substitute cached results for a given relation.
Annotations
@Stable()
Since
1.3.0
trait CatalystScan extends AnyRef
::Experimental:: An interface for experimenting with a more direct connection to the query planner.
::Experimental:: An interface for experimenting with a more direct connection to the query planner. Compared to PrunedFilteredScan, this operator receives the raw expressions from the org.apache.spark.sql.catalyst.plans.logical.LogicalPlan. Unlike the other APIs this interface is NOT designed to be binary compatible across releases and thus should only be used for experimentation.
Annotations
@Unstable()
Since
1.3.0
case class CollatedEqualNullSafe(attribute: String, value: Any, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of EqualNullSafe.
Collation aware equivalent of EqualNullSafe.
Annotations
@Evolving()
case class CollatedEqualTo(attribute: String, value: Any, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of EqualTo.
Collation aware equivalent of EqualTo.
Annotations
@Evolving()
abstract class CollatedFilter extends Filter
Base class for collation aware string filters.
Base class for collation aware string filters.
Annotations
@Evolving()
case class CollatedGreaterThan(attribute: String, value: Any, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of GreaterThan.
Collation aware equivalent of GreaterThan.
Annotations
@Evolving()
case class CollatedGreaterThanOrEqual(attribute: String, value: Any, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of GreaterThanOrEqual.
Collation aware equivalent of GreaterThanOrEqual.
Annotations
@Evolving()
case class CollatedIn(attribute: String, values: Array[Any], dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of In.
Collation aware equivalent of In.
Annotations
@Evolving()
case class CollatedLessThan(attribute: String, value: Any, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of LessThan.
Collation aware equivalent of LessThan.
Annotations
@Evolving()
case class CollatedLessThanOrEqual(attribute: String, value: Any, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of LessThanOrEqual.
Collation aware equivalent of LessThanOrEqual.
Annotations
@Evolving()
case class CollatedStringContains(attribute: String, value: String, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of StringContains.
Collation aware equivalent of StringContains.
Annotations
@Evolving()
case class CollatedStringEndsWith(attribute: String, value: String, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of StringEndsWith.
Collation aware equivalent of StringEndsWith.
Annotations
@Evolving()
case class CollatedStringStartsWith(attribute: String, value: String, dataType: DataType) extends CollatedFilter with Product with Serializable
Collation aware equivalent of StringStartsWith.
Collation aware equivalent of StringStartsWith.
Annotations
@Evolving()
trait CreatableRelationProvider extends AnyRef
Annotations
@Stable()
Since
1.3.0
trait DataSourceRegister extends AnyRef
Data sources should implement this trait so that they can register an alias to their data source.
Data sources should implement this trait so that they can register an alias to their data source. This allows users to give the data source alias as the format type over the fully qualified class name.
A new instance of this class will be instantiated each time a DDL call is made.
Annotations
@Stable()
Since
1.5.0
case class EqualNullSafe(attribute: String, value: Any) extends Filter with Product with Serializable
Performs equality comparison, similar to EqualTo.
Performs equality comparison, similar to EqualTo. However, this differs from EqualTo in that it returns true (rather than NULL) if both inputs are NULL, and false (rather than NULL) if one of the input is NULL and the other is not NULL.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.5.0
case class EqualTo(attribute: String, value: Any) extends Filter with Product with Serializable
A filter that evaluates to true iff the column evaluates to a value equal to value.
A filter that evaluates to true iff the column evaluates to a value equal to value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
sealed abstract class Filter extends AnyRef
A filter predicate for data sources.
A filter predicate for data sources. Mapping between Spark SQL types and filter value types follow the convention for return type of org.apache.spark.sql.Row#get(int).
Annotations
@Stable()
Since
1.3.0
case class GreaterThan(attribute: String, value: Any) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a value greater than value.
A filter that evaluates to true iff the attribute evaluates to a value greater than value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a value greater than or equal to value.
A filter that evaluates to true iff the attribute evaluates to a value greater than or equal to value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
case class In(attribute: String, values: Array[Any]) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to one of the values in the array.
A filter that evaluates to true iff the attribute evaluates to one of the values in the array.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
trait InsertableRelation extends AnyRef
A BaseRelation that can be used to insert data into it through the insert method.
A BaseRelation that can be used to insert data into it through the insert method. If overwrite in insert method is true, the old data in the relation should be overwritten with the new data. If overwrite in insert method is false, the new data should be appended.
InsertableRelation has the following three assumptions. 1. It assumes that the data (Rows in the DataFrame) provided to the insert method exactly matches the ordinal of fields in the schema of the BaseRelation. 2. It assumes that the schema of this relation will not be changed. Even if the insert method updates the schema (e.g. a relation of JSON or Parquet data may have a schema update after an insert operation), the new schema will not be used. 3. It assumes that fields of the data provided in the insert method are nullable. If a data source needs to check the actual nullability of a field, it needs to do it in the insert method.
Annotations
@Stable()
Since
1.3.0
case class IsNotNull(attribute: String) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a non-null value.
A filter that evaluates to true iff the attribute evaluates to a non-null value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
case class IsNull(attribute: String) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to null.
A filter that evaluates to true iff the attribute evaluates to null.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
case class LessThan(attribute: String, value: Any) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a value less than value.
A filter that evaluates to true iff the attribute evaluates to a value less than value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
case class LessThanOrEqual(attribute: String, value: Any) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a value less than or equal to value.
A filter that evaluates to true iff the attribute evaluates to a value less than or equal to value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.0
case class Not(child: Filter) extends Filter with Product with Serializable
A filter that evaluates to true iff child is evaluated to false.
A filter that evaluates to true iff child is evaluated to false.
Annotations
@Stable()
Since
1.3.0
case class Or(left: Filter, right: Filter) extends Filter with Product with Serializable
A filter that evaluates to true iff at least one of left or right evaluates to true.
A filter that evaluates to true iff at least one of left or right evaluates to true.
Annotations
@Stable()
Since
1.3.0
trait PrunedFilteredScan extends AnyRef
A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Row objects.
A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Row objects.
The actual filter should be the conjunction of all filters, i.e. they should be "and" together.
The pushed down filters are currently purely an optimization as they will all be evaluated again. This means it is safe to use them with methods that produce false positives such as filtering partitions based on a bloom filter.
Annotations
@Stable()
Since
1.3.0
trait PrunedScan extends AnyRef
A BaseRelation that can eliminate unneeded columns before producing an RDD containing all of its tuples as Row objects.
A BaseRelation that can eliminate unneeded columns before producing an RDD containing all of its tuples as Row objects.
Annotations
@Stable()
Since
1.3.0
trait RelationProvider extends AnyRef
Implemented by objects that produce relations for a specific kind of data source.
Implemented by objects that produce relations for a specific kind of data source. When Spark SQL is given a DDL operation with a USING clause specified (to specify the implemented RelationProvider), this interface is used to pass in the parameters specified by a user.
Users may specify the fully qualified class name of a given data source. When that class is not found Spark SQL will append the class name DefaultSource to the path, allowing for less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the data source 'org.apache.spark.sql.json.DefaultSource'
A new instance of this class will be instantiated each time a DDL call is made.
Annotations
@Stable()
Since
1.3.0
trait SchemaRelationProvider extends AnyRef
Implemented by objects that produce relations for a specific kind of data source with a given schema.
Implemented by objects that produce relations for a specific kind of data source with a given schema. When Spark SQL is given a DDL operation with a USING clause specified ( to specify the implemented SchemaRelationProvider) and a user defined schema, this interface is used to pass in the parameters specified by a user.
Users may specify the fully qualified class name of a given data source. When that class is not found Spark SQL will append the class name DefaultSource to the path, allowing for less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the data source 'org.apache.spark.sql.json.DefaultSource'
A new instance of this class will be instantiated each time a DDL call is made.
The difference between a RelationProvider and a SchemaRelationProvider is that users need to provide a schema when using a SchemaRelationProvider. A relation provider can inherit both RelationProvider and SchemaRelationProvider if it can support both schema inference and user-specified schemas.
Annotations
@Stable()
Since
1.3.0
trait StreamSinkProvider extends AnyRef
::Experimental:: Implemented by objects that can produce a streaming Sink for a specific format or system.
::Experimental:: Implemented by objects that can produce a streaming Sink for a specific format or system.
Annotations
@Unstable()
Since
2.0.0
trait StreamSourceProvider extends AnyRef
::Experimental:: Implemented by objects that can produce a streaming Source for a specific format or system.
::Experimental:: Implemented by objects that can produce a streaming Source for a specific format or system.
Annotations
@Unstable()
Since
2.0.0
case class StringContains(attribute: String, value: String) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a string that contains the string value.
A filter that evaluates to true iff the attribute evaluates to a string that contains the string value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.1
case class StringEndsWith(attribute: String, value: String) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a string that ends with value.
A filter that evaluates to true iff the attribute evaluates to a string that ends with value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.1
case class StringStartsWith(attribute: String, value: String) extends Filter with Product with Serializable
A filter that evaluates to true iff the attribute evaluates to a string that starts with value.
A filter that evaluates to true iff the attribute evaluates to a string that starts with value.
attribute
of the column to be evaluated; dots are used as separators for nested columns. If any part of the names contains dots, it is quoted to avoid confusion.
Annotations
@Stable()
Since
1.3.1
trait SupportsStreamSourceMetadataColumns extends StreamSourceProvider
Implemented by StreamSourceProvider objects that can generate file metadata columns.
Implemented by StreamSourceProvider objects that can generate file metadata columns. This trait extends the basic StreamSourceProvider by allowing the addition of metadata columns to the schema of the Stream Data Source.
trait TableScan extends AnyRef
A BaseRelation that can produce all of its tuples as an RDD of Row objects.
A BaseRelation that can produce all of its tuples as an RDD of Row objects.
Annotations
@Stable()
Since
1.3.0

Packages

sources

package sources

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

sources

package sources

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

sources