Package org.apache.spark.mllib.stat.test
Class StreamingTest
Object
org.apache.spark.mllib.stat.test.StreamingTest
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging
public class StreamingTest
extends Object
implements org.apache.spark.internal.Logging, Serializable
Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The
 Boolean identifies which sample each observation comes from, and the Double is the numeric value
 of the observation.
 
 To address novelty affects, the peacePeriod specifies a set number of initial
 RDD batches of the DStream to be dropped from significance testing.
 
 The windowSize sets the number of batches each significance test is to be performed over. The
 window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform
 cumulative processing, using all batches seen so far.
 
 Different tests may be used for assessing statistical significance depending on assumptions
 satisfied by data. For more details, see StreamingTestMethod. The testMethod specifies
 which test will be used.
 
Use a builder pattern to construct a streaming test in an application, for example:
   val model = new StreamingTest()
     .setPeacePeriod(10)
     .setWindowSize(0)
     .setTestMethod("welch")
     .registerStream(DStream)
 - See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionJavaDStream<org.apache.spark.mllib.stat.test.StreamingTestResult>Register aJavaDStreamof values for significance testing.DStream<org.apache.spark.mllib.stat.test.StreamingTestResult>registerStream(DStream<BinarySample> data) Register aDStreamof values for significance testing.setPeacePeriod(int peacePeriod) Set the number of initial batches to ignore.setTestMethod(String method) Set the statistical method used for significance testing.setWindowSize(int windowSize) Set the number of batches to compute significance tests over.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
StreamingTestpublic StreamingTest()
 
- 
- 
Method Details- 
registerStreampublic DStream<org.apache.spark.mllib.stat.test.StreamingTestResult> registerStream(DStream<BinarySample> data) Register aDStreamof values for significance testing.- Parameters:
- data- stream of BinarySample(key,value) pairs where the key denotes group membership (true = experiment, false = control) and the value is the numerical metric to test for significance
- Returns:
- stream of significance testing results
 
- 
registerStreampublic JavaDStream<org.apache.spark.mllib.stat.test.StreamingTestResult> registerStream(JavaDStream<BinarySample> data) Register aJavaDStreamof values for significance testing.- Parameters:
- data- stream of BinarySample(isExperiment,value) pairs where the isExperiment denotes group (true = experiment, false = control) and the value is the numerical metric to test for significance
- Returns:
- stream of significance testing results
 
- 
setPeacePeriodSet the number of initial batches to ignore. Default: 0.
- 
setTestMethodSet the statistical method used for significance testing. Default: "welch"
- 
setWindowSizeSet the number of batches to compute significance tests over. Default: 0. A value of 0 will use all batches seen so far.- Parameters:
- windowSize- (undocumented)
- Returns:
- (undocumented)
 
 
-