org.apache.spark.mllib.clustering
Class EMLDAOptimizer

Object
  extended by org.apache.spark.mllib.clustering.EMLDAOptimizer
All Implemented Interfaces:
LDAOptimizer

public final class EMLDAOptimizer
extends Object
implements LDAOptimizer

:: DeveloperApi ::

Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.

Currently, the underlying implementation uses Expectation-Maximization (EM), implemented according to the Asuncion et al. (2009) paper referenced below.

References: - Original LDA paper (journal version): Blei, Ng, and Jordan. "Latent Dirichlet Allocation." JMLR, 2003. - This class implements their "smoothed" LDA model. - Paper which clearly explains several algorithms, including EM: Asuncion, Welling, Smyth, and Teh. "On Smoothing and Inference for Topic Models." UAI, 2009.


Constructor Summary
EMLDAOptimizer()
           
 
Method Summary
 int checkpointInterval()
           
 double docConcentration()
           
 breeze.linalg.DenseVector<Object> globalTopicTotals()
          Aggregate distributions over topics from all term vertices.
 Graph<breeze.linalg.DenseVector<Object>,Object> graph()
          The following fields will only be initialized through the initialize() method
 int k()
           
 double topicConcentration()
           
 int vocabSize()
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

EMLDAOptimizer

public EMLDAOptimizer()
Method Detail

graph

public Graph<breeze.linalg.DenseVector<Object>,Object> graph()
The following fields will only be initialized through the initialize() method

Returns:
(undocumented)

k

public int k()

vocabSize

public int vocabSize()

docConcentration

public double docConcentration()

topicConcentration

public double topicConcentration()

checkpointInterval

public int checkpointInterval()

globalTopicTotals

public breeze.linalg.DenseVector<Object> globalTopicTotals()
Aggregate distributions over topics from all term vertices.

Note: This executes an action on the graph RDDs.

Returns:
(undocumented)