Class StreamingKMeans
Object
org.apache.spark.mllib.clustering.StreamingKMeans
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
public class StreamingKMeans
extends Object
implements org.apache.spark.internal.Logging, Serializable
StreamingKMeans provides methods for configuring a
streaming k-means analysis, training the model on streaming,
and using the model to make predictions on streaming data.
See KMeansModel for details on algorithm and update rules.
Use a builder pattern to construct a streaming k-means analysis in an application, like:
val model = new StreamingKMeans()
.setDecayFactor(0.5)
.setK(3)
.setRandomCenters(5, 100.0)
.trainOn(DStream)
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic final String
BATCHES()
double
int
k()
Return the latest model.static final String
POINTS()
predictOn
(JavaDStream<Vector> data) Java-friendly version ofpredictOn
.Use the clustering model to make predictions on batches of data from a DStream.<K> JavaPairDStream<K,
Integer> predictOnValues
(JavaPairDStream<K, Vector> data) Java-friendly version ofpredictOnValues
.predictOnValues
(DStream<scala.Tuple2<K, Vector>> data, scala.reflect.ClassTag<K> evidence$1) Use the model to make predictions on the values of a DStream and carry over its keys.setDecayFactor
(double a) Set the forgetfulness of the previous centroids.setHalfLife
(double halfLife, String timeUnit) Set the half life and time unit ("batches" or "points").setInitialCenters
(Vector[] centers, double[] weights) Specify initial centers directly.setK
(int k) Set the number of clusters.setRandomCenters
(int dim, double weight, long seed) Initialize random centers, requiring only the number of dimensions.timeUnit()
void
trainOn
(JavaDStream<Vector> data) Java-friendly version oftrainOn
.void
Update the clustering model by training on batches of data from a DStream.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
StreamingKMeans
-
StreamingKMeans
public StreamingKMeans()
-
-
Method Details
-
BATCHES
-
POINTS
-
k
public int k() -
decayFactor
public double decayFactor() -
timeUnit
-
setK
Set the number of clusters.- Parameters:
k
- (undocumented)- Returns:
- (undocumented)
-
setDecayFactor
Set the forgetfulness of the previous centroids.- Parameters:
a
- (undocumented)- Returns:
- (undocumented)
-
setHalfLife
Set the half life and time unit ("batches" or "points"). If points, then the decay factor is raised to the power of number of new points and if batches, then decay factor will be used as is.- Parameters:
halfLife
- (undocumented)timeUnit
- (undocumented)- Returns:
- (undocumented)
-
setInitialCenters
Specify initial centers directly.- Parameters:
centers
- (undocumented)weights
- (undocumented)- Returns:
- (undocumented)
-
setRandomCenters
Initialize random centers, requiring only the number of dimensions.- Parameters:
dim
- Number of dimensionsweight
- Weight for each centerseed
- Random seed- Returns:
- (undocumented)
-
latestModel
Return the latest model.- Returns:
- (undocumented)
-
trainOn
Update the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.- Parameters:
data
- DStream containing vector data
-
trainOn
Java-friendly version oftrainOn
.- Parameters:
data
- (undocumented)
-
predictOn
Use the clustering model to make predictions on batches of data from a DStream.- Parameters:
data
- DStream containing vector data- Returns:
- DStream containing predictions
-
predictOn
Java-friendly version ofpredictOn
.- Parameters:
data
- (undocumented)- Returns:
- (undocumented)
-
predictOnValues
public <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K, Vector>> data, scala.reflect.ClassTag<K> evidence$1) Use the model to make predictions on the values of a DStream and carry over its keys.- Parameters:
data
- DStream containing (key, feature vector) pairsevidence$1
- (undocumented)- Returns:
- DStream containing the input keys and the predictions as values
-
predictOnValues
Java-friendly version ofpredictOnValues
.- Parameters:
data
- (undocumented)- Returns:
- (undocumented)
-