class StreamingKMeans extends Logging with Serializable
StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data. See KMeansModel for details on algorithm and update rules.
Use a builder pattern to construct a streaming k-means analysis in an application, like:
val model = new StreamingKMeans() .setDecayFactor(0.5) .setK(3) .setRandomCenters(5, 100.0) .trainOn(DStream)
- Annotations
- @Since( "1.2.0" )
- Source
- StreamingKMeans.scala
- Alphabetic
- By Inheritance
- StreamingKMeans
- Serializable
- Serializable
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Value Members
-
var
decayFactor: Double
- Annotations
- @Since( "1.2.0" )
-
var
k: Int
- Annotations
- @Since( "1.2.0" )
-
def
latestModel(): StreamingKMeansModel
Return the latest model.
Return the latest model.
- Annotations
- @Since( "1.2.0" )
-
def
predictOn(data: JavaDStream[Vector]): JavaDStream[Integer]
Java-friendly version of
predictOn
.Java-friendly version of
predictOn
.- Annotations
- @Since( "1.4.0" )
-
def
predictOn(data: DStream[Vector]): DStream[Int]
Use the clustering model to make predictions on batches of data from a DStream.
Use the clustering model to make predictions on batches of data from a DStream.
- data
DStream containing vector data
- returns
DStream containing predictions
- Annotations
- @Since( "1.2.0" )
-
def
predictOnValues[K](data: JavaPairDStream[K, Vector]): JavaPairDStream[K, Integer]
Java-friendly version of
predictOnValues
.Java-friendly version of
predictOnValues
.- Annotations
- @Since( "1.4.0" )
-
def
predictOnValues[K](data: DStream[(K, Vector)])(implicit arg0: ClassTag[K]): DStream[(K, Int)]
Use the model to make predictions on the values of a DStream and carry over its keys.
Use the model to make predictions on the values of a DStream and carry over its keys.
- K
key type
- data
DStream containing (key, feature vector) pairs
- returns
DStream containing the input keys and the predictions as values
- Annotations
- @Since( "1.2.0" )
-
def
setDecayFactor(a: Double): StreamingKMeans.this.type
Set the forgetfulness of the previous centroids.
Set the forgetfulness of the previous centroids.
- Annotations
- @Since( "1.2.0" )
-
def
setHalfLife(halfLife: Double, timeUnit: String): StreamingKMeans.this.type
Set the half life and time unit ("batches" or "points").
Set the half life and time unit ("batches" or "points"). If points, then the decay factor is raised to the power of number of new points and if batches, then decay factor will be used as is.
- Annotations
- @Since( "1.2.0" )
-
def
setInitialCenters(centers: Array[Vector], weights: Array[Double]): StreamingKMeans.this.type
Specify initial centers directly.
Specify initial centers directly.
- Annotations
- @Since( "1.2.0" )
-
def
setK(k: Int): StreamingKMeans.this.type
Set the number of clusters.
Set the number of clusters.
- Annotations
- @Since( "1.2.0" )
-
def
setRandomCenters(dim: Int, weight: Double, seed: Long = Utils.random.nextLong): StreamingKMeans.this.type
Initialize random centers, requiring only the number of dimensions.
Initialize random centers, requiring only the number of dimensions.
- dim
Number of dimensions
- weight
Weight for each center
- seed
Random seed
- Annotations
- @Since( "1.2.0" )
-
var
timeUnit: String
- Annotations
- @Since( "1.2.0" )
-
def
trainOn(data: JavaDStream[Vector]): Unit
Java-friendly version of
trainOn
.Java-friendly version of
trainOn
.- Annotations
- @Since( "1.4.0" )
-
def
trainOn(data: DStream[Vector]): Unit
Update the clustering model by training on batches of data from a DStream.
Update the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.
- data
DStream containing vector data
- Annotations
- @Since( "1.2.0" )