Class Tokenizer

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Params, HasInputCol, HasOutputCol, DefaultParamsWritable, Identifiable, MLWritable

public class Tokenizer extends UnaryTransformer<String,scala.collection.immutable.Seq<String>,Tokenizer> implements DefaultParamsWritable
A tokenizer that converts the input string to lowercase and then splits it by white spaces.

See Also:
  • Constructor Details

    • Tokenizer

      public Tokenizer(String uid)
    • Tokenizer

      public Tokenizer()
  • Method Details

    • load

      public static Tokenizer load(String path)
    • read

      public static MLReader<T> read()
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • copy

      public Tokenizer copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Overrides:
      copy in class UnaryTransformer<String,scala.collection.immutable.Seq<String>,Tokenizer>
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)