OrcFileFormat (Spark 2.3.1 JavaDoc)

Object
- org.apache.spark.sql.hive.orc.OrcFileFormat

All Implemented Interfaces:

java.io.Serializable, org.apache.spark.sql.execution.datasources.FileFormat, DataSourceRegister
```
public class OrcFileFormat
extends Object
implements org.apache.spark.sql.execution.datasources.FileFormat, DataSourceRegister, scala.Serializable
```
FileFormat for reading ORC files. If this is moved or renamed, please update DataSource's backwardCompatibilityMap.

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

OrcFileFormat()

Constructors
Constructor and Description
`OrcFileFormat()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>`	`buildReader(SparkSession sparkSession, StructType dataSchema, StructType partitionSchema, StructType requiredSchema, scala.collection.Seq<Filter> filters, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.conf.Configuration hadoopConf)`
`static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>>`	`buildReaderWithPartitionValues(SparkSession sparkSession, StructType dataSchema, StructType partitionSchema, StructType requiredSchema, scala.collection.Seq<Filter> filters, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.conf.Configuration hadoopConf)`
`static scala.collection.immutable.Map<String,String>`	`extensionsForCompressionCodecNames()`
`scala.Option<StructType>`	`inferSchema(SparkSession sparkSession, scala.collection.immutable.Map<String,String> options, scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)`
`boolean`	`isSplitable(SparkSession sparkSession, scala.collection.immutable.Map<String,String> options, org.apache.hadoop.fs.Path path)`
`org.apache.spark.sql.execution.datasources.OutputWriterFactory`	`prepareWrite(SparkSession sparkSession, org.apache.hadoop.mapreduce.Job job, scala.collection.immutable.Map<String,String> options, StructType dataSchema)`
`static void`	`setRequiredColumns(org.apache.hadoop.conf.Configuration conf, StructType dataSchema, StructType requestedSchema)`
`String`	`shortName()` The string that represents the format that this data source provider uses.
`static boolean`	`supportBatch(SparkSession sparkSession, StructType dataSchema)`
`String`	`toString()`
`static scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>`	`unwrapOrcStructs(org.apache.hadoop.conf.Configuration conf, StructType dataSchema, StructType requiredSchema, scala.Option<org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector> maybeStructOI, scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator)`
`static scala.Option<scala.collection.Seq<String>>`	`vectorTypes(StructType requiredSchema, StructType partitionSchema, org.apache.spark.sql.internal.SQLConf sqlConf)`

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat
buildReaderWithPartitionValues, supportBatch, vectorTypes

Constructor Detail
- OrcFileFormat
```
public OrcFileFormat()
```

Method Detail

extensionsForCompressionCodecNames

public static scala.collection.immutable.Map<String,String> extensionsForCompressionCodecNames()

unwrapOrcStructs

public static scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow> unwrapOrcStructs(org.apache.hadoop.conf.Configuration conf,
                                                                                                    StructType dataSchema,
                                                                                                    StructType requiredSchema,
                                                                                                    scala.Option<org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector> maybeStructOI,
                                                                                                    scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator)

setRequiredColumns

public static void setRequiredColumns(org.apache.hadoop.conf.Configuration conf,
                                      StructType dataSchema,
                                      StructType requestedSchema)

supportBatch

public static boolean supportBatch(SparkSession sparkSession,
                                   StructType dataSchema)

vectorTypes

public static scala.Option<scala.collection.Seq<String>> vectorTypes(StructType requiredSchema,
                                                                     StructType partitionSchema,
                                                                     org.apache.spark.sql.internal.SQLConf sqlConf)

buildReaderWithPartitionValues

public static scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReaderWithPartitionValues(SparkSession sparkSession,
                                                                                                                                                                                              StructType dataSchema,
                                                                                                                                                                                              StructType partitionSchema,
                                                                                                                                                                                              StructType requiredSchema,
                                                                                                                                                                                              scala.collection.Seq<Filter> filters,
                                                                                                                                                                                              scala.collection.immutable.Map<String,String> options,
                                                                                                                                                                                              org.apache.hadoop.conf.Configuration hadoopConf)

shortName
```
public String shortName()
```
Description copied from interface: DataSourceRegister
The string that represents the format that this data source provider uses. This is overridden by children to provide a nice alias for the data source. For example:
```
   override def shortName(): String = "parquet"
 
```
Specified by:

shortName in interface DataSourceRegister

Returns:

(undocumented)

toString
```
public String toString()
```
Overrides:

toString in class Object

inferSchema

public scala.Option<StructType> inferSchema(SparkSession sparkSession,
                                            scala.collection.immutable.Map<String,String> options,
                                            scala.collection.Seq<org.apache.hadoop.fs.FileStatus> files)

Specified by:: inferSchema in interface org.apache.spark.sql.execution.datasources.FileFormat

prepareWrite

public org.apache.spark.sql.execution.datasources.OutputWriterFactory prepareWrite(SparkSession sparkSession,
                                                                                   org.apache.hadoop.mapreduce.Job job,
                                                                                   scala.collection.immutable.Map<String,String> options,
                                                                                   StructType dataSchema)

Specified by:: prepareWrite in interface org.apache.spark.sql.execution.datasources.FileFormat

isSplitable

public boolean isSplitable(SparkSession sparkSession,
                           scala.collection.immutable.Map<String,String> options,
                           org.apache.hadoop.fs.Path path)

Specified by:: isSplitable in interface org.apache.spark.sql.execution.datasources.FileFormat

buildReader

public scala.Function1<org.apache.spark.sql.execution.datasources.PartitionedFile,scala.collection.Iterator<org.apache.spark.sql.catalyst.InternalRow>> buildReader(SparkSession sparkSession,
                                                                                                                                                                    StructType dataSchema,
                                                                                                                                                                    StructType partitionSchema,
                                                                                                                                                                    StructType requiredSchema,
                                                                                                                                                                    scala.collection.Seq<Filter> filters,
                                                                                                                                                                    scala.collection.immutable.Map<String,String> options,
                                                                                                                                                                    org.apache.hadoop.conf.Configuration hadoopConf)

Specified by:: buildReader in interface org.apache.spark.sql.execution.datasources.FileFormat

Class OrcFileFormat

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.sql.execution.datasources.FileFormat

Constructor Detail

OrcFileFormat

Method Detail

extensionsForCompressionCodecNames

unwrapOrcStructs

setRequiredColumns

supportBatch

vectorTypes

buildReaderWithPartitionValues

shortName

toString

inferSchema

prepareWrite

isSplitable

buildReader