Package org.apache.spark.ml.feature
Class OneHotEncoderCommon
Object
org.apache.spark.ml.feature.OneHotEncoderCommon
Provides some helper methods used by
OneHotEncoder
.-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic AttributeGroup
createAttrGroupForAttrNames
(String outputColName, int numAttrs, boolean dropLast, boolean keepInvalid) Creates an `AttributeGroup` with the required number of `BinaryAttribute`.static scala.collection.immutable.Seq<AttributeGroup>
getOutputAttrGroupFromData
(Dataset<?> dataset, scala.collection.immutable.Seq<String> inputColNames, scala.collection.immutable.Seq<String> outputColNames, boolean dropLast) This method is called when we want to generateAttributeGroup
from actual data for one-hot encoder.static StructField
transformOutputColumnSchema
(StructField inputCol, String outputColName, boolean dropLast, boolean keepInvalid) Prepares theStructField
with proper metadata forOneHotEncoder
's output column.
-
Constructor Details
-
OneHotEncoderCommon
public OneHotEncoderCommon()
-
-
Method Details
-
transformOutputColumnSchema
public static StructField transformOutputColumnSchema(StructField inputCol, String outputColName, boolean dropLast, boolean keepInvalid) Prepares theStructField
with proper metadata forOneHotEncoder
's output column.- Parameters:
inputCol
- (undocumented)outputColName
- (undocumented)dropLast
- (undocumented)keepInvalid
- (undocumented)- Returns:
- (undocumented)
-
getOutputAttrGroupFromData
public static scala.collection.immutable.Seq<AttributeGroup> getOutputAttrGroupFromData(Dataset<?> dataset, scala.collection.immutable.Seq<String> inputColNames, scala.collection.immutable.Seq<String> outputColNames, boolean dropLast) This method is called when we want to generateAttributeGroup
from actual data for one-hot encoder.- Parameters:
dataset
- (undocumented)inputColNames
- (undocumented)outputColNames
- (undocumented)dropLast
- (undocumented)- Returns:
- (undocumented)
-
createAttrGroupForAttrNames
public static AttributeGroup createAttrGroupForAttrNames(String outputColName, int numAttrs, boolean dropLast, boolean keepInvalid) Creates an `AttributeGroup` with the required number of `BinaryAttribute`.
-