Class OnlineLDAOptimizer

Object
org.apache.spark.mllib.clustering.OnlineLDAOptimizer
All Implemented Interfaces:
org.apache.spark.internal.Logging, LDAOptimizer

public final class OnlineLDAOptimizer extends Object implements LDAOptimizer, org.apache.spark.internal.Logging
An online optimizer for LDA. The Optimizer implements the Online variational Bayes LDA algorithm, which processes a subset of the corpus on each iteration, and updates the term-topic distribution adaptively.

Original Online LDA paper: Hoffman, Blei and Bach, "Online Learning for Latent Dirichlet Allocation." NIPS, 2010.

  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    Learning rate: exponential decay rate
    double
    Mini-batch fraction, which sets the fraction of document sampled and used in each iteration
    boolean
    Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.
    double
    A (positive) learning parameter that downweights early iterations.
    setKappa(double kappa)
    Learning rate: exponential decay rate---should be between (0.5, 1.0] to guarantee asymptotic convergence.
    setMiniBatchFraction(double miniBatchFraction)
    Mini-batch fraction in (0, 1], which sets the fraction of document sampled and used in each iteration.
    setOptimizeDocConcentration(boolean optimizeDocConcentration)
    Sets whether to optimize docConcentration parameter during training.
    setTau0(double tau0)
    A (positive) learning parameter that downweights early iterations.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq
  • Constructor Details

    • OnlineLDAOptimizer

      public OnlineLDAOptimizer()
  • Method Details

    • getTau0

      public double getTau0()
      A (positive) learning parameter that downweights early iterations. Larger values make early iterations count less.
      Returns:
      (undocumented)
    • setTau0

      public OnlineLDAOptimizer setTau0(double tau0)
      A (positive) learning parameter that downweights early iterations. Larger values make early iterations count less. Default: 1024, following the original Online LDA paper.
      Parameters:
      tau0 - (undocumented)
      Returns:
      (undocumented)
    • getKappa

      public double getKappa()
      Learning rate: exponential decay rate
      Returns:
      (undocumented)
    • setKappa

      public OnlineLDAOptimizer setKappa(double kappa)
      Learning rate: exponential decay rate---should be between (0.5, 1.0] to guarantee asymptotic convergence. Default: 0.51, based on the original Online LDA paper.
      Parameters:
      kappa - (undocumented)
      Returns:
      (undocumented)
    • getMiniBatchFraction

      public double getMiniBatchFraction()
      Mini-batch fraction, which sets the fraction of document sampled and used in each iteration
      Returns:
      (undocumented)
    • setMiniBatchFraction

      public OnlineLDAOptimizer setMiniBatchFraction(double miniBatchFraction)
      Mini-batch fraction in (0, 1], which sets the fraction of document sampled and used in each iteration.

      Parameters:
      miniBatchFraction - (undocumented)
      Returns:
      (undocumented)
      Note:
      This should be adjusted in synch with LDA.setMaxIterations() so the entire corpus is used. Specifically, set both so that maxIterations * miniBatchFraction is at least 1.

      Default: 0.05, i.e., 5% of total documents.

    • getOptimizeDocConcentration

      public boolean getOptimizeDocConcentration()
      Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.
      Returns:
      (undocumented)
    • setOptimizeDocConcentration

      public OnlineLDAOptimizer setOptimizeDocConcentration(boolean optimizeDocConcentration)
      Sets whether to optimize docConcentration parameter during training.

      Default: false

      Parameters:
      optimizeDocConcentration - (undocumented)
      Returns:
      (undocumented)