public class OrcFilters
extends Object
SearchArgument
s, which are used for ORC predicate push-down.
Due to limitation of ORC SearchArgument
builder, we had to end up with a pretty weird double-
checking pattern when converting And
/Or
/Not
filters.
An ORC SearchArgument
must be built in one pass using a single builder. For example, you can't
build a = 1
and b = 2
first, and then combine them into a = 1 AND b = 2
. This is quite
different from the cases in Spark SQL or Parquet, where complex filters can be easily built using
existing simpler ones.
The annoying part is that, SearchArgument
builder methods like startAnd()
, startOr()
, and
startNot()
mutate internal state of the builder instance. This forces us to translate all
convertible filters with a single builder instance. However, before actually converting a filter,
we've no idea whether it can be recognized by ORC or not. Thus, when an inconvertible filter is
found, we may already end up with a builder whose internal state is inconsistent.
For example, to convert an And
filter with builder b
, we call b.startAnd()
first, and then
try to convert its children. Say we convert left
child successfully, but find that right
child is inconvertible. Alas, b.startAnd()
call can't be rolled back, and b
is inconsistent
now.
The workaround employed here is that, for And
/Or
/Not
, we first try to convert their
children with brand new builders, and only do the actual conversion with the right builder
instance when the children are proven to be convertible.
P.S.: Hive seems to use SearchArgument
together with ExprNodeGenericFuncDesc
only. Usage of
builder methods mentioned above can only be found in test code, where all tested filters are
known to be convertible.
Constructor and Description |
---|
OrcFilters() |
Modifier and Type | Method and Description |
---|---|
static scala.collection.Seq<Filter> |
convertibleFilters(StructType schema,
scala.collection.immutable.Map<String,DataType> dataTypeMap,
scala.collection.Seq<Filter> filters) |
static scala.Option<org.apache.hadoop.hive.ql.io.sarg.SearchArgument> |
createFilter(StructType schema,
Filter[] filters) |
static void |
org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) |
static org.slf4j.Logger |
org$apache$spark$internal$Logging$$log_() |
public static scala.Option<org.apache.hadoop.hive.ql.io.sarg.SearchArgument> createFilter(StructType schema, Filter[] filters)
public static scala.collection.Seq<Filter> convertibleFilters(StructType schema, scala.collection.immutable.Map<String,DataType> dataTypeMap, scala.collection.Seq<Filter> filters)
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)