Package org.apache.spark.ml.util
Class MetadataUtils
Object
org.apache.spark.ml.util.MetadataUtils
Helper utilities for algorithms using ML metadata
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiongetCategoricalFeatures
(StructField featuresSchema) Examine a schema to identify categorical (Binary and Nominal) features.static int[]
getFeatureIndicesFromNames
(StructField col, String[] names) Takes a Vector column and a list of feature names, and returns the corresponding list of feature indices in the column, in order.static scala.Option<Object>
getNumClasses
(StructField labelSchema) Examine a schema to identify the number of classes in a label column.static scala.Option<Object>
getNumFeatures
(StructField vectorSchema) Examine a schema to identify the number of features in a vector column.
-
Constructor Details
-
MetadataUtils
public MetadataUtils()
-
-
Method Details
-
getNumClasses
Examine a schema to identify the number of classes in a label column. Returns None if the number of labels is not specified, or if the label column is continuous.- Parameters:
labelSchema
- (undocumented)- Returns:
- (undocumented)
-
getNumFeatures
Examine a schema to identify the number of features in a vector column. Returns None if the number of features is not specified.- Parameters:
vectorSchema
- (undocumented)- Returns:
- (undocumented)
-
getCategoricalFeatures
public static scala.collection.immutable.Map<Object,Object> getCategoricalFeatures(StructField featuresSchema) Examine a schema to identify categorical (Binary and Nominal) features.- Parameters:
featuresSchema
- Schema of the features column. If a feature does not have metadata, it is assumed to be continuous. If a feature is Nominal, then it must have the number of values specified.- Returns:
- Map: feature index to number of categories. The map's set of keys will be the set of categorical feature indices.
-
getFeatureIndicesFromNames
Takes a Vector column and a list of feature names, and returns the corresponding list of feature indices in the column, in order.- Parameters:
col
- Vector column which must have feature names specified via attributesnames
- List of feature names- Returns:
- (undocumented)
-