org.apache.spark.sql.sources
Interface HadoopFsRelationProvider


public interface HadoopFsRelationProvider

::Experimental:: Implemented by objects that produce relations for a specific kind of data source with a given schema and partitioned columns. When Spark SQL is given a DDL operation with a USING clause specified (to specify the implemented HadoopFsRelationProvider), a user defined schema, and an optional list of partition columns, this interface is used to pass in the parameters specified by a user.

Users may specify the fully qualified class name of a given data source. When that class is not found Spark SQL will append the class name DefaultSource to the path, allowing for less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the data source 'org.apache.spark.sql.json.DefaultSource'

A new instance of this class with be instantiated each time a DDL call is made.

The difference between a RelationProvider and a HadoopFsRelationProvider is that users need to provide a schema and a (possibly empty) list of partition columns when using a SchemaRelationProvider. A relation provider can inherits both RelationProvider, and HadoopFsRelationProvider if it can support schema inference, user-specified schemas, and accessing partitioned relations.

Since:
1.4.0

Method Summary
 HadoopFsRelation createRelation(SQLContext sqlContext, String[] paths, scala.Option<StructType> dataSchema, scala.Option<StructType> partitionColumns, scala.collection.immutable.Map<String,String> parameters)
          Returns a new base relation with the given parameters, a user defined schema, and a list of partition columns.
 

Method Detail

createRelation

HadoopFsRelation createRelation(SQLContext sqlContext,
                                String[] paths,
                                scala.Option<StructType> dataSchema,
                                scala.Option<StructType> partitionColumns,
                                scala.collection.immutable.Map<String,String> parameters)
Returns a new base relation with the given parameters, a user defined schema, and a list of partition columns. Note: the parameters' keywords are case insensitive and this insensitivity is enforced by the Map that is passed to the function.

Parameters:
dataSchema - Schema of data columns (i.e., columns that are not partition columns).
sqlContext - (undocumented)
paths - (undocumented)
partitionColumns - (undocumented)
parameters - (undocumented)
Returns:
(undocumented)