public interface RFormulaBase extends HasFeaturesCol, HasLabelCol, HasHandleInvalid
RFormula and RFormulaModel.| Modifier and Type | Method and Description |
|---|---|
BooleanParam |
forceIndexLabel()
Force to index label whether it is numeric or string type.
|
Param<String> |
formula()
R formula parameter.
|
boolean |
getForceIndexLabel() |
String |
getFormula() |
String |
getStringIndexerOrderType() |
Param<String> |
handleInvalid()
Param for how to handle invalid data (unseen or NULL values) in features and label column
of string type.
|
boolean |
hasLabelCol(StructType schema) |
Param<String> |
stringIndexerOrderType()
Param for how to order categories of a string FEATURE column used by
StringIndexer. |
featuresCol, getFeaturesColgetLabelCol, labelColgetHandleInvalidclear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwntoString, uidBooleanParam forceIndexLabel()
Param<String> formula()
boolean getForceIndexLabel()
String getFormula()
String getStringIndexerOrderType()
Param<String> handleInvalid()
handleInvalid in interface HasHandleInvalidboolean hasLabelCol(StructType schema)
Param<String> stringIndexerOrderType()
StringIndexer.
The last category after ordering is dropped when encoding strings.
Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'.
The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc', RFormula
drops the same category as R when encoding strings.
The options are explained using an example 'b', 'a', 'b', 'a', 'c', 'b':
+-----------------+---------------------------------------+----------------------------------+
| Option | Category mapped to 0 by StringIndexer | Category dropped by RFormula |
+-----------------+---------------------------------------+----------------------------------+
| 'frequencyDesc' | most frequent category ('b') | least frequent category ('c') |
| 'frequencyAsc' | least frequent category ('c') | most frequent category ('b') |
| 'alphabetDesc' | last alphabetical category ('c') | first alphabetical category ('a')|
| 'alphabetAsc' | first alphabetical category ('a') | last alphabetical category ('c') |
+-----------------+---------------------------------------+----------------------------------+
Note that this ordering option is NOT used for the label column. When the label column is
indexed, it uses the default descending frequency ordering in StringIndexer.