public class OrcFilters extends Object
SearchArguments, which are used for ORC predicate push-down.
Due to limitation of ORC
SearchArgument builder, we had to end up with a pretty weird double-
checking pattern when converting
SearchArgument must be built in one pass using a single builder. For example, you can't
a = 1 and
b = 2 first, and then combine them into
a = 1 AND b = 2. This is quite
different from the cases in Spark SQL or Parquet, where complex filters can be easily built using
existing simpler ones.
The annoying part is that,
SearchArgument builder methods like
startNot() mutate internal state of the builder instance. This forces us to translate all
convertible filters with a single builder instance. However, before actually converting a filter,
we've no idea whether it can be recognized by ORC or not. Thus, when an inconvertible filter is
found, we may already end up with a builder whose internal state is inconsistent.
For example, to convert an
And filter with builder
b, we call
b.startAnd() first, and then
try to convert its children. Say we convert
left child successfully, but find that
child is inconvertible. Alas,
b.startAnd() call can't be rolled back, and
b is inconsistent
The workaround employed here is that, for
Not, we first try to convert their
children with brand new builders, and only do the actual conversion with the right builder
instance when the children are proven to be convertible.
P.S.: Hive seems to use
SearchArgument together with
ExprNodeGenericFuncDesc only. Usage of
builder methods mentioned above can only be found in test code, where all tested filters are
known to be convertible.
|Constructor and Description|
|Modifier and Type||Method and Description|
public static scala.Option<org.apache.hadoop.hive.ql.io.sarg.SearchArgument> createFilter(StructType schema, Filter filters)
public static scala.collection.Seq<Filter> convertibleFilters(StructType schema, scala.collection.immutable.Map<String,DataType> dataTypeMap, scala.collection.Seq<Filter> filters)
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)