Interface StringIndexerBase

All Superinterfaces:
HasHandleInvalid, HasInputCol, HasInputCols, HasOutputCol, HasOutputCols, Identifiable, Params, Serializable, scala.Serializable
All Known Implementing Classes:
StringIndexer, StringIndexerModel

public interface StringIndexerBase extends Params, HasHandleInvalid, HasInputCol, HasOutputCol, HasInputCols, HasOutputCols
Base trait for StringIndexer and StringIndexerModel.
  • Method Details

    • getInOutCols

      scala.Tuple2<String[],String[]> getInOutCols()
      Returns the input and output column names corresponding in pair.
    • getStringOrderType

      String getStringOrderType()
    • handleInvalid

      Param<String> handleInvalid()
      Param for how to handle invalid data (unseen labels or NULL values). Options are 'skip' (filter out rows with invalid data), 'error' (throw an error), or 'keep' (put invalid data in a special additional bucket, at index numLabels). Default: "error"
      Specified by:
      handleInvalid in interface HasHandleInvalid
      Returns:
      (undocumented)
    • stringOrderType

      Param<String> stringOrderType()
      Param for how to order labels of string column. The first label after ordering is assigned an index of 0. Options are: - 'frequencyDesc': descending order by label frequency (most frequent label assigned 0) - 'frequencyAsc': ascending order by label frequency (least frequent label assigned 0) - 'alphabetDesc': descending alphabetical order - 'alphabetAsc': ascending alphabetical order Default is 'frequencyDesc'.

      Note: In case of equal frequency when under frequencyDesc/Asc, the strings are further sorted alphabetically.

      Returns:
      (undocumented)
    • validateAndTransformField

      StructField validateAndTransformField(StructType schema, String inputColName, String outputColName)
    • validateAndTransformSchema

      StructType validateAndTransformSchema(StructType schema, boolean skipNonExistsCol)
      Validates and transforms the input schema.