Package org.apache.spark.ml.feature
Interface RFormulaBase
- All Superinterfaces:
- HasFeaturesCol,- HasHandleInvalid,- HasLabelCol,- Identifiable,- Params,- Serializable
- All Known Implementing Classes:
- RFormula,- RFormulaModel
Base trait for 
RFormula and RFormulaModel.- 
Method SummaryModifier and TypeMethodDescriptionForce to index label whether it is numeric or string type.formula()R formula parameter.booleanParam for how to handle invalid data (unseen or NULL values) in features and label column of string type.booleanhasLabelCol(StructType schema) Param for how to order categories of a string FEATURE column used byStringIndexer.Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesColfeaturesCol, getFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasHandleInvalidgetHandleInvalidMethods inherited from interface org.apache.spark.ml.param.shared.HasLabelColgetLabelCol, labelColMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoString, uidMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
- 
Method Details- 
forceIndexLabelBooleanParam forceIndexLabel()Force to index label whether it is numeric or string type. Usually we index label only when it is string type. If the formula was used by classification algorithms, we can force to index label even it is numeric type by setting this param with true. Default: false.- Returns:
- (undocumented)
 
- 
formulaR formula parameter. The formula is provided in string form.- Returns:
- (undocumented)
 
- 
getForceIndexLabelboolean getForceIndexLabel()
- 
getFormulaString getFormula()
- 
getStringIndexerOrderTypeString getStringIndexerOrderType()
- 
handleInvalidParam for how to handle invalid data (unseen or NULL values) in features and label column of string type. Options are 'skip' (filter out rows with invalid data), 'error' (throw an error), or 'keep' (put invalid data in a special additional bucket, at index numLabels). Default: "error"- Specified by:
- handleInvalidin interface- HasHandleInvalid
- Returns:
- (undocumented)
 
- 
hasLabelCol
- 
stringIndexerOrderTypeParam for how to order categories of a string FEATURE column used byStringIndexer. The last category after ordering is dropped when encoding strings. Supported options: 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'. The default value is 'frequencyDesc'. When the ordering is set to 'alphabetDesc',RFormuladrops the same category as R when encoding strings.The options are explained using an example 'b', 'a', 'b', 'a', 'c', 'b':
 Note that this ordering option is NOT used for the label column. When the label column is indexed, it uses the default descending frequency ordering in+-----------------+---------------------------------------+----------------------------------+ | Option | Category mapped to 0 by StringIndexer | Category dropped by RFormula | +-----------------+---------------------------------------+----------------------------------+ | 'frequencyDesc' | most frequent category ('b') | least frequent category ('c') | | 'frequencyAsc' | least frequent category ('c') | most frequent category ('b') | | 'alphabetDesc' | last alphabetical category ('c') | first alphabetical category ('a')| | 'alphabetAsc' | first alphabetical category ('a') | last alphabetical category ('c') | +-----------------+---------------------------------------+----------------------------------+StringIndexer.- Returns:
- (undocumented)
 
 
-