StreamingKMeans (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.mllib.clustering
Class StreamingKMeans

Object
  org.apache.spark.mllib.clustering.StreamingKMeans

All Implemented Interfaces:: java.io.Serializable, Logging

public class StreamingKMeans
extends Object
implements Logging, scala.Serializable
extends Object
implements Logging, scala.Serializable

:: Experimental ::

StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data. See KMeansModel for details on algorithm and update rules.

Use a builder pattern to construct a streaming k-means analysis in an application, like:


  val model = new StreamingKMeans()
    .setDecayFactor(0.5)
    .setK(3)
    .setRandomCenters(5, 100.0)
    .trainOn(DStream)

See Also:: Serialized Form

Constructor Summary
`StreamingKMeans()`
`StreamingKMeans(int k, double decayFactor, String timeUnit)`

Method Summary

static String BATCHES()

double decayFactor()

int k()

StreamingKMeansModel latestModel()
Return the latest model.

static String POINTS()

DStream<Object> predictOn(DStream<Vector> data)
Use the clustering model to make predictions on batches of data from a DStream.

JavaDStream<Integer> predictOn(JavaDStream<Vector> data)
Java-friendly version of `predictOn`.





<K> DStream<scala.Tuple2<K,Object>>

predictOnValues(DStream<scala.Tuple2<K,Vector>> data,
                scala.reflect.ClassTag<K> evidence$1)

Use the model to make predictions on the values of a DStream and carry over its keys.





<K> JavaPairDStream<K,Integer>

predictOnValues(JavaPairDStream<K,Vector> data)
Java-friendly version of `predictOnValues`.

StreamingKMeans setDecayFactor(double a)
Set the decay factor directly (for forgetful algorithms).

StreamingKMeans setHalfLife(double halfLife, String timeUnit)
Set the half life and time unit ("batches" or "points") for forgetful algorithms.

StreamingKMeans setInitialCenters(Vector[] centers, double[] weights)
Specify initial centers directly.

StreamingKMeans setK(int k)
Set the number of clusters.

StreamingKMeans setRandomCenters(int dim, double weight, long seed)
Initialize random centers, requiring only the number of dimensions.

String timeUnit()

void trainOn(DStream<Vector> data)
Update the clustering model by training on batches of data from a DStream.

void trainOn(JavaDStream<Vector> data)
Java-friendly version of `trainOn`.

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface org.apache.spark.Logging
`initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning`

Constructor Detail

StreamingKMeans

public StreamingKMeans(int k,
                       double decayFactor,
                       String timeUnit)

StreamingKMeans

public StreamingKMeans()

Method Detail

BATCHES

public static final String BATCHES()

POINTS

public static final String POINTS()

k

public int k()

decayFactor

public double decayFactor()

timeUnit

public String timeUnit()

setK

public StreamingKMeans setK(int k)

Set the number of clusters.

setDecayFactor

public StreamingKMeans setDecayFactor(double a)

Set the decay factor directly (for forgetful algorithms).

setHalfLife

public StreamingKMeans setHalfLife(double halfLife,
                                   String timeUnit)

Set the half life and time unit ("batches" or "points") for forgetful algorithms.

setInitialCenters

public StreamingKMeans setInitialCenters(Vector[] centers,
                                         double[] weights)

Specify initial centers directly.

setRandomCenters

public StreamingKMeans setRandomCenters(int dim,
                                        double weight,
                                        long seed)

Initialize random centers, requiring only the number of dimensions.

Parameters:: dim - Number of dimensions; weight - Weight for each center; seed - Random seed
Returns:: (undocumented)

latestModel

public StreamingKMeansModel latestModel()

Return the latest model.

trainOn

public void trainOn(DStream<Vector> data)

Update the clustering model by training on batches of data from a DStream. This operation registers a DStream for training the model, checks whether the cluster centers have been initialized, and updates the model using each batch of data from the stream.

Parameters:: data - DStream containing vector data

trainOn

public void trainOn(JavaDStream<Vector> data)

Java-friendly version of `trainOn`.

predictOn

public DStream<Object> predictOn(DStream<Vector> data)

Use the clustering model to make predictions on batches of data from a DStream.

Parameters:: data - DStream containing vector data
Returns:: DStream containing predictions

predictOn

public JavaDStream<Integer> predictOn(JavaDStream<Vector> data)

Java-friendly version of `predictOn`.

predictOnValues

public <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K,Vector>> data,
                                                           scala.reflect.ClassTag<K> evidence$1)

Use the model to make predictions on the values of a DStream and carry over its keys.

Parameters:: data - DStream containing (key, feature vector) pairs; evidence$1 - (undocumented)
Returns:: DStream containing the input keys and the predictions as values

predictOnValues

public <K> JavaPairDStream<K,Integer> predictOnValues(JavaPairDStream<K,Vector> data)

Java-friendly version of `predictOnValues`.

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.mllib.clustering Class StreamingKMeans

StreamingKMeans

StreamingKMeans

BATCHES

POINTS

k

decayFactor

timeUnit

setK

setDecayFactor

setHalfLife

setInitialCenters

setRandomCenters

latestModel

trainOn

trainOn

predictOn

predictOn

predictOnValues

predictOnValues

org.apache.spark.mllib.clustering
Class StreamingKMeans