public class StreamingKMeans
extends Object
implements org.apache.spark.internal.Logging, scala.Serializable
Use a builder pattern to construct a streaming k-means analysis in an application, like:
val model = new StreamingKMeans()
.setDecayFactor(0.5)
.setK(3)
.setRandomCenters(5, 100.0)
.trainOn(DStream)
Constructor and Description |
---|
StreamingKMeans() |
StreamingKMeans(int k,
double decayFactor,
String timeUnit) |
Modifier and Type | Method and Description |
---|---|
static String |
BATCHES() |
double |
decayFactor() |
int |
k() |
StreamingKMeansModel |
latestModel()
Return the latest model.
|
static String |
POINTS() |
DStream<Object> |
predictOn(DStream<Vector> data)
Use the clustering model to make predictions on batches of data from a DStream.
|
JavaDStream<Integer> |
predictOn(JavaDStream<Vector> data)
Java-friendly version of
predictOn . |
<K> DStream<scala.Tuple2<K,Object>> |
predictOnValues(DStream<scala.Tuple2<K,Vector>> data,
scala.reflect.ClassTag<K> evidence$1)
Use the model to make predictions on the values of a DStream and carry over its keys.
|
<K> JavaPairDStream<K,Integer> |
predictOnValues(JavaPairDStream<K,Vector> data)
Java-friendly version of
predictOnValues . |
StreamingKMeans |
setDecayFactor(double a)
Set the forgetfulness of the previous centroids.
|
StreamingKMeans |
setHalfLife(double halfLife,
String timeUnit)
Set the half life and time unit ("batches" or "points").
|
StreamingKMeans |
setInitialCenters(Vector[] centers,
double[] weights)
Specify initial centers directly.
|
StreamingKMeans |
setK(int k)
Set the number of clusters.
|
StreamingKMeans |
setRandomCenters(int dim,
double weight,
long seed)
Initialize random centers, requiring only the number of dimensions.
|
String |
timeUnit() |
void |
trainOn(DStream<Vector> data)
Update the clustering model by training on batches of data from a DStream.
|
void |
trainOn(JavaDStream<Vector> data)
Java-friendly version of
trainOn . |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public StreamingKMeans(int k, double decayFactor, String timeUnit)
public StreamingKMeans()
public static final String BATCHES()
public static final String POINTS()
public int k()
public double decayFactor()
public String timeUnit()
public StreamingKMeans setK(int k)
k
- (undocumented)public StreamingKMeans setDecayFactor(double a)
a
- (undocumented)public StreamingKMeans setHalfLife(double halfLife, String timeUnit)
halfLife
- (undocumented)timeUnit
- (undocumented)public StreamingKMeans setInitialCenters(Vector[] centers, double[] weights)
centers
- (undocumented)weights
- (undocumented)public StreamingKMeans setRandomCenters(int dim, double weight, long seed)
dim
- Number of dimensionsweight
- Weight for each centerseed
- Random seedpublic StreamingKMeansModel latestModel()
public void trainOn(DStream<Vector> data)
data
- DStream containing vector datapublic void trainOn(JavaDStream<Vector> data)
trainOn
.data
- (undocumented)public DStream<Object> predictOn(DStream<Vector> data)
data
- DStream containing vector datapublic JavaDStream<Integer> predictOn(JavaDStream<Vector> data)
predictOn
.data
- (undocumented)public <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)
data
- DStream containing (key, feature vector) pairsevidence$1
- (undocumented)public <K> JavaPairDStream<K,Integer> predictOnValues(JavaPairDStream<K,Vector> data)
predictOnValues
.data
- (undocumented)