Class NGram

Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Transformer
org.apache.spark.ml.UnaryTransformer<scala.collection.immutable.Seq<String>,scala.collection.immutable.Seq<String>,NGram>
org.apache.spark.ml.feature.NGram
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Params, HasInputCol, HasOutputCol, DefaultParamsWritable, Identifiable, MLWritable

public class NGram extends UnaryTransformer<scala.collection.immutable.Seq<String>,scala.collection.immutable.Seq<String>,NGram> implements DefaultParamsWritable
A feature transformer that converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.

When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned.

See Also:
  • Constructor Details

    • NGram

      public NGram(String uid)
    • NGram

      public NGram()
  • Method Details

    • load

      public static NGram load(String path)
    • read

      public static MLReader<T> read()
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • n

      public IntParam n()
      Minimum n-gram length, greater than or equal to 1. Default: 2, bigram features
      Returns:
      (undocumented)
    • setN

      public NGram setN(int value)
    • getN

      public int getN()
    • toString

      public String toString()
      Specified by:
      toString in interface Identifiable
      Overrides:
      toString in class Object