Class StreamingKMeans
Object
org.apache.spark.mllib.clustering.StreamingKMeans
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging
public class StreamingKMeans
extends Object
implements org.apache.spark.internal.Logging, Serializable
StreamingKMeans provides methods for configuring a
 streaming k-means analysis, training the model on streaming,
 and using the model to make predictions on streaming data.
 See KMeansModel for details on algorithm and update rules.
 
Use a builder pattern to construct a streaming k-means analysis in an application, like:
  val model = new StreamingKMeans()
    .setDecayFactor(0.5)
    .setK(3)
    .setRandomCenters(5, 100.0)
    .trainOn(DStream)
 - See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionstatic final StringBATCHES()doubleintk()Return the latest model.static final StringPOINTS()predictOn(JavaDStream<Vector> data) Java-friendly version ofpredictOn.Use the clustering model to make predictions on batches of data from a DStream.<K> JavaPairDStream<K,Integer> predictOnValues(JavaPairDStream<K, Vector> data) Java-friendly version ofpredictOnValues.predictOnValues(DStream<scala.Tuple2<K, Vector>> data, scala.reflect.ClassTag<K> evidence$1) Use the model to make predictions on the values of a DStream and carry over its keys.setDecayFactor(double a) Set the forgetfulness of the previous centroids.setHalfLife(double halfLife, String timeUnit) Set the half life and time unit ("batches" or "points").setInitialCenters(Vector[] centers, double[] weights) Specify initial centers directly.setK(int k) Set the number of clusters.setRandomCenters(int dim, double weight, long seed) Initialize random centers, requiring only the number of dimensions.timeUnit()voidtrainOn(JavaDStream<Vector> data) Java-friendly version oftrainOn.voidUpdate the clustering model by training on batches of data from a DStream.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
StreamingKMeans
- 
StreamingKMeanspublic StreamingKMeans()
 
- 
- 
Method Details- 
BATCHES
- 
POINTS
- 
kpublic int k()
- 
decayFactorpublic double decayFactor()
- 
timeUnit
- 
setKSet the number of clusters.- Parameters:
- k- (undocumented)
- Returns:
- (undocumented)
 
- 
setDecayFactorSet the forgetfulness of the previous centroids.- Parameters:
- a- (undocumented)
- Returns:
- (undocumented)
 
- 
setHalfLifeSet the half life and time unit ("batches" or "points"). If points, then the decay factor is raised to the power of number of new points and if batches, then decay factor will be used as is.- Parameters:
- halfLife- (undocumented)
- timeUnit- (undocumented)
- Returns:
- (undocumented)
 
- 
setInitialCentersSpecify initial centers directly.- Parameters:
- centers- (undocumented)
- weights- (undocumented)
- Returns:
- (undocumented)
 
- 
setRandomCentersInitialize random centers, requiring only the number of dimensions.- Parameters:
- dim- Number of dimensions
- weight- Weight for each center
- seed- Random seed
- Returns:
- (undocumented)
 
- 
latestModelReturn the latest model.- Returns:
- (undocumented)
 
- 
trainOnUpdate the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.- Parameters:
- data- DStream containing vector data
 
- 
trainOnJava-friendly version oftrainOn.- Parameters:
- data- (undocumented)
 
- 
predictOnUse the clustering model to make predictions on batches of data from a DStream.- Parameters:
- data- DStream containing vector data
- Returns:
- DStream containing predictions
 
- 
predictOnJava-friendly version ofpredictOn.- Parameters:
- data- (undocumented)
- Returns:
- (undocumented)
 
- 
predictOnValuespublic <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K, Vector>> data, scala.reflect.ClassTag<K> evidence$1) Use the model to make predictions on the values of a DStream and carry over its keys.- Parameters:
- data- DStream containing (key, feature vector) pairs
- evidence$1- (undocumented)
- Returns:
- DStream containing the input keys and the predictions as values
 
- 
predictOnValuesJava-friendly version ofpredictOnValues.- Parameters:
- data- (undocumented)
- Returns:
- (undocumented)
 
 
-