Class StreamingLinearAlgorithm<M extends GeneralizedLinearModel,A extends GeneralizedLinearAlgorithm<M>>
- All Implemented Interfaces:
org.apache.spark.internal.Logging
- Direct Known Subclasses:
StreamingLinearRegressionWithSGD
,StreamingLogisticRegressionWithSGD
This class takes as type parameters a GeneralizedLinearModel, and a GeneralizedLinearAlgorithm, making it easy to extend to construct streaming versions of any analyses using GLMs. Initial weights must be set before calling trainOn or predictOn. Only weights will be updated, not an intercept. If the model needs an intercept, it should be manually appended to the input data.
For example usage, see StreamingLinearRegressionWithSGD
.
NOTE: In some use cases, the order in which trainOn and predictOn are called in an application will affect the results. When called on the same DStream, if trainOn is called before predictOn, when new data arrive the model will update and the prediction will be based on the new model. Whereas if predictOn is called first, the prediction will use the model from the previous update.
NOTE: It is ok to call predictOn repeatedly on multiple streams; this will generate predictions for each one all using the current model. It is also ok to call trainOn on different streams; this will update the model using each of the different sources, in sequence.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionReturn the latest model.predictOn
(JavaDStream<Vector> data) Java-friendly version ofpredictOn
.Use the model to make predictions on batches of data from a DStream<K> JavaPairDStream<K,
Double> predictOnValues
(JavaPairDStream<K, Vector> data) Java-friendly version ofpredictOnValues
.predictOnValues
(DStream<scala.Tuple2<K, Vector>> data, scala.reflect.ClassTag<K> evidence$1) Use the model to make predictions on the values of a DStream and carry over its keys.void
trainOn
(JavaDStream<LabeledPoint> data) Java-friendly version oftrainOn
.void
trainOn
(DStream<LabeledPoint> data) Update the model by training on batches of data from a DStream.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
StreamingLinearAlgorithm
public StreamingLinearAlgorithm()
-
-
Method Details
-
latestModel
Return the latest model.- Returns:
- (undocumented)
-
predictOn
Use the model to make predictions on batches of data from a DStream- Parameters:
data
- DStream containing feature vectors- Returns:
- DStream containing predictions
-
predictOn
Java-friendly version ofpredictOn
.- Parameters:
data
- (undocumented)- Returns:
- (undocumented)
-
predictOnValues
public <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K, Vector>> data, scala.reflect.ClassTag<K> evidence$1) Use the model to make predictions on the values of a DStream and carry over its keys.- Parameters:
data
- DStream containing feature vectorsevidence$1
- (undocumented)- Returns:
- DStream containing the input keys and the predictions as values
-
predictOnValues
Java-friendly version ofpredictOnValues
.- Parameters:
data
- (undocumented)- Returns:
- (undocumented)
-
trainOn
Update the model by training on batches of data from a DStream. This operation registers a DStream for training the model, and updates the model based on every subsequent batch of data from the stream.- Parameters:
data
- DStream containing labeled data
-
trainOn
Java-friendly version oftrainOn
.- Parameters:
data
- (undocumented)
-