Class StreamingKMeans

Object
org.apache.spark.mllib.clustering.StreamingKMeans
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging

public class StreamingKMeans extends Object implements org.apache.spark.internal.Logging, Serializable
StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data. See KMeansModel for details on algorithm and update rules.

Use a builder pattern to construct a streaming k-means analysis in an application, like:


  val model = new StreamingKMeans()
    .setDecayFactor(0.5)
    .setK(3)
    .setRandomCenters(5, 100.0)
    .trainOn(DStream)
 
See Also:
  • Constructor Details

    • StreamingKMeans

      public StreamingKMeans(int k, double decayFactor, String timeUnit)
    • StreamingKMeans

      public StreamingKMeans()
  • Method Details

    • BATCHES

      public static final String BATCHES()
    • POINTS

      public static final String POINTS()
    • k

      public int k()
    • decayFactor

      public double decayFactor()
    • timeUnit

      public String timeUnit()
    • setK

      public StreamingKMeans setK(int k)
      Set the number of clusters.
      Parameters:
      k - (undocumented)
      Returns:
      (undocumented)
    • setDecayFactor

      public StreamingKMeans setDecayFactor(double a)
      Set the forgetfulness of the previous centroids.
      Parameters:
      a - (undocumented)
      Returns:
      (undocumented)
    • setHalfLife

      public StreamingKMeans setHalfLife(double halfLife, String timeUnit)
      Set the half life and time unit ("batches" or "points"). If points, then the decay factor is raised to the power of number of new points and if batches, then decay factor will be used as is.
      Parameters:
      halfLife - (undocumented)
      timeUnit - (undocumented)
      Returns:
      (undocumented)
    • setInitialCenters

      public StreamingKMeans setInitialCenters(Vector[] centers, double[] weights)
      Specify initial centers directly.
      Parameters:
      centers - (undocumented)
      weights - (undocumented)
      Returns:
      (undocumented)
    • setRandomCenters

      public StreamingKMeans setRandomCenters(int dim, double weight, long seed)
      Initialize random centers, requiring only the number of dimensions.

      Parameters:
      dim - Number of dimensions
      weight - Weight for each center
      seed - Random seed
      Returns:
      (undocumented)
    • latestModel

      public StreamingKMeansModel latestModel()
      Return the latest model.
      Returns:
      (undocumented)
    • trainOn

      public void trainOn(DStream<Vector> data)
      Update the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.

      Parameters:
      data - DStream containing vector data
    • trainOn

      public void trainOn(JavaDStream<Vector> data)
      Java-friendly version of trainOn.
      Parameters:
      data - (undocumented)
    • predictOn

      public DStream<Object> predictOn(DStream<Vector> data)
      Use the clustering model to make predictions on batches of data from a DStream.

      Parameters:
      data - DStream containing vector data
      Returns:
      DStream containing predictions
    • predictOn

      public JavaDStream<Integer> predictOn(JavaDStream<Vector> data)
      Java-friendly version of predictOn.
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)
    • predictOnValues

      public <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)
      Use the model to make predictions on the values of a DStream and carry over its keys.

      Parameters:
      data - DStream containing (key, feature vector) pairs
      evidence$1 - (undocumented)
      Returns:
      DStream containing the input keys and the predictions as values
    • predictOnValues

      public <K> JavaPairDStream<K,Integer> predictOnValues(JavaPairDStream<K,Vector> data)
      Java-friendly version of predictOnValues.
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)