public class OrcFileOperator
extends Object
Constructor and Description |
---|
OrcFileOperator() |
Modifier and Type | Method and Description |
---|---|
static scala.Option<org.apache.hadoop.hive.ql.io.orc.Reader> |
getFileReader(String basePath,
scala.Option<org.apache.hadoop.conf.Configuration> config)
Retrieves an ORC file reader from a given path.
|
static scala.Option<org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector> |
getObjectInspector(String path,
scala.Option<org.apache.hadoop.conf.Configuration> conf) |
static scala.collection.Seq<org.apache.hadoop.fs.Path> |
listOrcFiles(String pathStr,
org.apache.hadoop.conf.Configuration conf) |
static scala.Option<StructType> |
readSchema(scala.collection.Seq<String> paths,
scala.Option<org.apache.hadoop.conf.Configuration> conf) |
public static scala.Option<org.apache.hadoop.hive.ql.io.orc.Reader> getFileReader(String basePath, scala.Option<org.apache.hadoop.conf.Configuration> config)
The reader returned by this method is mainly used for two purposes:
1. Retrieving file metadata (schema and compression codecs, etc.) 2. Read the actual file content (in this case, the given path should point to the target file)
basePath
- (undocumented)config
- (undocumented)struct<>
) to an
ORC file if the file contains zero rows. This is OK for Hive since the schema of the
table is managed by metastore. But this becomes a problem when reading ORC files
directly from HDFS via Spark SQL, because we have to discover the schema from raw ORC
files. So this method always tries to find an ORC file whose schema is non-empty, and
create the result reader from that file. If no such file is found, it returns None
.public static scala.Option<StructType> readSchema(scala.collection.Seq<String> paths, scala.Option<org.apache.hadoop.conf.Configuration> conf)
public static scala.Option<org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector> getObjectInspector(String path, scala.Option<org.apache.hadoop.conf.Configuration> conf)
public static scala.collection.Seq<org.apache.hadoop.fs.Path> listOrcFiles(String pathStr, org.apache.hadoop.conf.Configuration conf)