StreamingKMeansModel

class StreamingKMeansModel extends KMeansModel with Logging

StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.

The update algorithm uses the "mini-batch" KMeans rule, generalized to incorporate forgetfulness (i.e. decay). The update rule (for each cluster) is:

$$ \begin{align} c_{t+1} &= [(c_t * n_t * a) + (x_t * m_t)] / [n_t + m_t] \\ n_{t+1} &= n_t * a + m_t \end{align} $$

Where c_t is the previously estimated centroid for that cluster, n_t is the number of points assigned to it thus far, x_t is the centroid estimated on the current batch, and m_t is the number of points assigned to that centroid in the current batch.

The decay factor 'a' scales the contribution of the clusters as estimated thus far, by applying a as a discount weighting on the current point when evaluating new incoming data. If a=1, all batches are weighted equally. If a=0, new centroids are determined entirely by recent data. Lower values correspond to more forgetting.

Decay can optionally be specified by a half life and associated time unit. The time unit can either be a batch of data or a single data point. Considering data arrived at time t, the half life h is defined such that at time t + h the discount applied to the data from t is 0.5. The definition remains the same whether the time unit is given as batches or points.

Annotations: @Since( "1.2.0" )
Source: StreamingKMeans.scala

Linear Supertypes

Logging, KMeansModel, PMMLExportable, Serializable, Serializable, Saveable, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

StreamingKMeansModel
Logging
KMeansModel
PMMLExportable
Serializable
Serializable
Saveable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new StreamingKMeansModel(clusterCenters: Array[Vector], clusterWeights: Array[Double])

Annotations
@Since( "1.2.0" )

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val clusterCenters: Array[Vector]

Definition Classes
StreamingKMeansModel → KMeansModel
Annotations
@Since( "1.2.0" )
val clusterWeights: Array[Double]

Annotations
@Since( "1.2.0" )
def computeCost(data: RDD[Vector]): Double
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
val distanceMeasure: String

Definition Classes
KMeansModel
Annotations
@Since( "2.4.0" )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def k: Int
Total number of clusters.
Total number of clusters.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def predict(points: JavaRDD[Vector]): JavaRDD[Integer]
Maps given points to their cluster indices.
Maps given points to their cluster indices.

Definition Classes
KMeansModel
Annotations
@Since( "1.0.0" )
def predict(points: RDD[Vector]): RDD[Int]
Maps given points to their cluster indices.
Maps given points to their cluster indices.

Definition Classes
KMeansModel
Annotations
@Since( "1.0.0" )
def predict(point: Vector): Int
Returns the cluster index that a given point belongs to.
Returns the cluster index that a given point belongs to.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
def save(sc: SparkContext, path: String): Unit
Save this model to the given path.
Save this model to the given path.
This saves:
- human-readable (JSON) model metadata to path/metadata/
- Parquet formatted data to path/data/
The model may be loaded using Loader.load.
sc
Spark context used to save model data.
path
Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.

Definition Classes
KMeansModel → Saveable
Annotations
@Since( "1.4.0" )
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toPMML(): String
Export the model to a String in PMML format
Export the model to a String in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(outputStream: OutputStream): Unit
Export the model to the OutputStream in PMML format
Export the model to the OutputStream in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(sc: SparkContext, path: String): Unit
Export the model to a directory on a distributed file system in PMML format
Export the model to a directory on a distributed file system in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(localPath: String): Unit
Export the model to a local file in PMML format
Export the model to a local file in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toString(): String

Definition Classes
AnyRef → Any
val trainingCost: Double

Definition Classes
KMeansModel
Annotations
@Since( "2.4.0" )
def update(data: RDD[Vector], decayFactor: Double, timeUnit: String): StreamingKMeansModel
Perform a k-means update on a batch of data.
Perform a k-means update on a batch of data.

Annotations
@Since( "1.2.0" )
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Inherited from Logging

Value Members

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging

Inherited from KMeansModel

Value Members

def computeCost(data: RDD[Vector]): Double
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
val distanceMeasure: String

Definition Classes
KMeansModel
Annotations
@Since( "2.4.0" )
def k: Int
Total number of clusters.
Total number of clusters.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
def predict(points: JavaRDD[Vector]): JavaRDD[Integer]
Maps given points to their cluster indices.
Maps given points to their cluster indices.

Definition Classes
KMeansModel
Annotations
@Since( "1.0.0" )
def predict(points: RDD[Vector]): RDD[Int]
Maps given points to their cluster indices.
Maps given points to their cluster indices.

Definition Classes
KMeansModel
Annotations
@Since( "1.0.0" )
def predict(point: Vector): Int
Returns the cluster index that a given point belongs to.
Returns the cluster index that a given point belongs to.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
def save(sc: SparkContext, path: String): Unit
Save this model to the given path.
Save this model to the given path.
This saves:
- human-readable (JSON) model metadata to path/metadata/
- Parquet formatted data to path/data/
The model may be loaded using Loader.load.
sc
Spark context used to save model data.
path
Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.

Definition Classes
KMeansModel → Saveable
Annotations
@Since( "1.4.0" )
val trainingCost: Double

Definition Classes
KMeansModel
Annotations
@Since( "2.4.0" )

Inherited from PMMLExportable

Value Members

def toPMML(): String
Export the model to a String in PMML format
Export the model to a String in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(outputStream: OutputStream): Unit
Export the model to the OutputStream in PMML format
Export the model to the OutputStream in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(sc: SparkContext, path: String): Unit
Export the model to a directory on a distributed file system in PMML format
Export the model to a directory on a distributed file system in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(localPath: String): Unit
Export the model to a local file in PMML format
Export the model to a local file in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )

Inherited from AnyRef

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Inherited from Any

Value Members

final def asInstanceOf[T0]: T0

Definition Classes
Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any

Ungrouped

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val clusterCenters: Array[Vector]

Definition Classes
StreamingKMeansModel → KMeansModel
Annotations
@Since( "1.2.0" )
val clusterWeights: Array[Double]

Annotations
@Since( "1.2.0" )
def computeCost(data: RDD[Vector]): Double
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
val distanceMeasure: String

Definition Classes
KMeansModel
Annotations
@Since( "2.4.0" )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def k: Int
Total number of clusters.
Total number of clusters.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def predict(points: JavaRDD[Vector]): JavaRDD[Integer]
Maps given points to their cluster indices.
Maps given points to their cluster indices.

Definition Classes
KMeansModel
Annotations
@Since( "1.0.0" )
def predict(points: RDD[Vector]): RDD[Int]
Maps given points to their cluster indices.
Maps given points to their cluster indices.

Definition Classes
KMeansModel
Annotations
@Since( "1.0.0" )
def predict(point: Vector): Int
Returns the cluster index that a given point belongs to.
Returns the cluster index that a given point belongs to.

Definition Classes
KMeansModel
Annotations
@Since( "0.8.0" )
def save(sc: SparkContext, path: String): Unit
Save this model to the given path.
Save this model to the given path.
This saves:
- human-readable (JSON) model metadata to path/metadata/
- Parquet formatted data to path/data/
The model may be loaded using Loader.load.
sc
Spark context used to save model data.
path
Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.

Definition Classes
KMeansModel → Saveable
Annotations
@Since( "1.4.0" )
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toPMML(): String
Export the model to a String in PMML format
Export the model to a String in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(outputStream: OutputStream): Unit
Export the model to the OutputStream in PMML format
Export the model to the OutputStream in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(sc: SparkContext, path: String): Unit
Export the model to a directory on a distributed file system in PMML format
Export the model to a directory on a distributed file system in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toPMML(localPath: String): Unit
Export the model to a local file in PMML format
Export the model to a local file in PMML format

Definition Classes
PMMLExportable
Annotations
@Since( "1.4.0" )
def toString(): String

Definition Classes
AnyRef → Any
val trainingCost: Double

Definition Classes
KMeansModel
Annotations
@Since( "2.4.0" )
def update(data: RDD[Vector], decayFactor: Double, timeUnit: String): StreamingKMeansModel
Perform a k-means update on a batch of data.
Perform a k-means update on a batch of data.

Annotations
@Since( "1.2.0" )
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

StreamingKMeansModel 

class StreamingKMeansModel extends KMeansModel with Logging

Instance Constructors

Value Members

Inherited from Logging

Value Members

Inherited from KMeansModel

Value Members

Inherited from PMMLExportable

Value Members

Inherited from AnyRef

Value Members

Inherited from Any

Value Members

Ungrouped

StreamingKMeansModel