Class OnlineLDAOptimizer
Object
org.apache.spark.mllib.clustering.OnlineLDAOptimizer
- All Implemented Interfaces:
- org.apache.spark.internal.Logging,- LDAOptimizer
public final class OnlineLDAOptimizer
extends Object
implements LDAOptimizer, org.apache.spark.internal.Logging
An online optimizer for LDA. The Optimizer implements the Online variational Bayes LDA
 algorithm, which processes a subset of the corpus on each iteration, and updates the term-topic
 distribution adaptively.
 
Original Online LDA paper: Hoffman, Blei and Bach, "Online Learning for Latent Dirichlet Allocation." NIPS, 2010.
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptiondoublegetKappa()Learning rate: exponential decay ratedoubleMini-batch fraction, which sets the fraction of document sampled and used in each iterationbooleanOptimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.doublegetTau0()A (positive) learning parameter that downweights early iterations.setKappa(double kappa) Learning rate: exponential decay rate---should be between (0.5, 1.0] to guarantee asymptotic convergence.setMiniBatchFraction(double miniBatchFraction) Mini-batch fraction in (0, 1], which sets the fraction of document sampled and used in each iteration.setOptimizeDocConcentration(boolean optimizeDocConcentration) Sets whether to optimize docConcentration parameter during training.setTau0(double tau0) A (positive) learning parameter that downweights early iterations.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
OnlineLDAOptimizerpublic OnlineLDAOptimizer()
 
- 
- 
Method Details- 
getTau0public double getTau0()A (positive) learning parameter that downweights early iterations. Larger values make early iterations count less.- Returns:
- (undocumented)
 
- 
setTau0A (positive) learning parameter that downweights early iterations. Larger values make early iterations count less. Default: 1024, following the original Online LDA paper.- Parameters:
- tau0- (undocumented)
- Returns:
- (undocumented)
 
- 
getKappapublic double getKappa()Learning rate: exponential decay rate- Returns:
- (undocumented)
 
- 
setKappaLearning rate: exponential decay rate---should be between (0.5, 1.0] to guarantee asymptotic convergence. Default: 0.51, based on the original Online LDA paper.- Parameters:
- kappa- (undocumented)
- Returns:
- (undocumented)
 
- 
getMiniBatchFractionpublic double getMiniBatchFraction()Mini-batch fraction, which sets the fraction of document sampled and used in each iteration- Returns:
- (undocumented)
 
- 
setMiniBatchFractionMini-batch fraction in (0, 1], which sets the fraction of document sampled and used in each iteration.- Parameters:
- miniBatchFraction- (undocumented)
- Returns:
- (undocumented)
- Note:
- This should be adjusted in synch with LDA.setMaxIterations()so the entire corpus is used. Specifically, set both so that maxIterations * miniBatchFraction is at least 1.Default: 0.05, i.e., 5% of total documents. 
 
- 
getOptimizeDocConcentrationpublic boolean getOptimizeDocConcentration()Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.- Returns:
- (undocumented)
 
- 
setOptimizeDocConcentrationSets whether to optimize docConcentration parameter during training.Default: false - Parameters:
- optimizeDocConcentration- (undocumented)
- Returns:
- (undocumented)
 
 
-