LocalLDAModel (Spark 3.5.5 JavaDoc)

Object
- org.apache.spark.mllib.clustering.LDAModel
- - org.apache.spark.mllib.clustering.LocalLDAModel

All Implemented Interfaces:

java.io.Serializable, Saveable
```
public class LocalLDAModel
extends LDAModel
implements scala.Serializable
```
Local LDA model. This model stores only the inferred topics.
param: topics Inferred topics (vocabSize x k matrix).

See Also:

Serialized Form

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`scala.Tuple2<int[],double[]>[]`	`describeTopics(int maxTermsPerTopic)` Return the topics described by weighted terms.
`Vector`	`docConcentration()` Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").
`long`	`getSeed()` Random seed for cluster initialization.
`int`	`k()` Number of topics
`static LocalLDAModel`	`load(SparkContext sc, String path)`
`double`	`logLikelihood(JavaPairRDD<Long,Vector> documents)` Java-friendly version of `logLikelihood`
`double`	`logLikelihood(RDD<scala.Tuple2<Object,Vector>> documents)` Calculates a lower bound on the log likelihood of the entire corpus.
`double`	`logPerplexity(JavaPairRDD<Long,Vector> documents)` Java-friendly version of `logPerplexity`
`double`	`logPerplexity(RDD<scala.Tuple2<Object,Vector>> documents)` Calculate an upper bound on perplexity.
`void`	`save(SparkContext sc, String path)` Save this model to the given path.
`LocalLDAModel`	`setSeed(long seed)` Set the random seed for cluster initialization.
`double`	`topicConcentration()` Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.
`Vector`	`topicDistribution(Vector document)` Predicts the topic mixture distribution for a document (often called "theta" in the literature).
`JavaPairRDD<Long,Vector>`	`topicDistributions(JavaPairRDD<Long,Vector> documents)` Java-friendly version of `topicDistributions`
`RDD<scala.Tuple2<Object,Vector>>`	`topicDistributions(RDD<scala.Tuple2<Object,Vector>> documents)` Predicts the topic mixture distribution for each document (often called "theta" in the literature).
`Matrix`	`topics()`
`Matrix`	`topicsMatrix()` Inferred topics, where each topic is represented by a distribution over terms.
`int`	`vocabSize()` Vocabulary size (number of terms or terms in the vocabulary)

Methods inherited from class org.apache.spark.mllib.clustering.LDAModel
describeTopics

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - load
```
public static LocalLDAModel load(SparkContext sc,
                                 String path)
```
  - topics
```
public Matrix topics()
```
  - docConcentration
```
public Vector docConcentration()
```
    Description copied from class: LDAModel
    
    Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").
    This is the parameter to a Dirichlet distribution.
    
    Specified by:
    
    docConcentration in class LDAModel
    
    Returns:
    
    (undocumented)
  - topicConcentration
```
public double topicConcentration()
```
    Description copied from class: LDAModel
    
    Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.
    This is the parameter to a symmetric Dirichlet distribution.
    
    Specified by:
    
    topicConcentration in class LDAModel
    
    Returns:
    
    (undocumented)
  - k
```
public int k()
```
    Description copied from class: LDAModel
    
    Number of topics
    
    Specified by:
    
    k in class LDAModel
  - vocabSize
```
public int vocabSize()
```
    Description copied from class: LDAModel
    
    Vocabulary size (number of terms or terms in the vocabulary)
    
    Specified by:
    
    vocabSize in class LDAModel
  - topicsMatrix
```
public Matrix topicsMatrix()
```
    Description copied from class: LDAModel
    
    Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics.
    
    Specified by:
    
    topicsMatrix in class LDAModel
    
    Returns:
    
    (undocumented)
  - describeTopics
```
public scala.Tuple2<int[],double[]>[] describeTopics(int maxTermsPerTopic)
```
    Description copied from class: LDAModel
    
    Return the topics described by weighted terms.
    
    Specified by:
    
    describeTopics in class LDAModel
    
    Parameters:
    
    maxTermsPerTopic - Maximum number of terms to collect for each topic.
    
    Returns:
    
    Array over topics. Each topic is represented as a pair of matching arrays: (term indices, term weights in topic). Each topic's terms are sorted in order of decreasing weight.
  - getSeed
```
public long getSeed()
```
    Random seed for cluster initialization.
    
    Returns:
    
    (undocumented)
  - setSeed
```
public LocalLDAModel setSeed(long seed)
```
    Set the random seed for cluster initialization.
    
    Parameters:
    
    seed - (undocumented)
    
    Returns:
    
    (undocumented)
  - save
```
public void save(SparkContext sc,
                 String path)
```
    Description copied from interface: Saveable
    
    Save this model to the given path.
    This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/
    The model may be loaded using Loader.load.
    
    Specified by:
    
    save in interface Saveable
    
    Parameters:
    
    sc - Spark context used to save model data.
    
    path - Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.
  - logLikelihood
```
public double logLikelihood(RDD<scala.Tuple2<Object,Vector>> documents)
```
    Calculates a lower bound on the log likelihood of the entire corpus.
    See Equation (16) in original Online LDA paper.
    
    Parameters:
    
    documents - test corpus to use for calculating log likelihood
    
    Returns:
    
    variational lower bound on the log likelihood of the entire corpus
  - logLikelihood
```
public double logLikelihood(JavaPairRDD<Long,Vector> documents)
```
    Java-friendly version of logLikelihood
    
    Parameters:
    
    documents - (undocumented)
    
    Returns:
    
    (undocumented)
  - logPerplexity
```
public double logPerplexity(RDD<scala.Tuple2<Object,Vector>> documents)
```
    Calculate an upper bound on perplexity. (Lower is better.) See Equation (16) in original Online LDA paper.
    
    Parameters:
    
    documents - test corpus to use for calculating perplexity
    
    Returns:
    
    Variational upper bound on log perplexity per token.
  - logPerplexity
```
public double logPerplexity(JavaPairRDD<Long,Vector> documents)
```
    Java-friendly version of logPerplexity
    
    Parameters:
    
    documents - (undocumented)
    
    Returns:
    
    (undocumented)
  - topicDistributions
```
public RDD<scala.Tuple2<Object,Vector>> topicDistributions(RDD<scala.Tuple2<Object,Vector>> documents)
```
    Predicts the topic mixture distribution for each document (often called "theta" in the literature). Returns a vector of zeros for an empty document.
    This uses a variational approximation following Hoffman et al. (2010), where the approximate distribution is called "gamma." Technically, this method returns this approximation "gamma" for each document.
    
    Parameters:
    
    documents - documents to predict topic mixture distributions for
    
    Returns:
    
    An RDD of (document ID, topic mixture distribution for document)
  - topicDistribution
```
public Vector topicDistribution(Vector document)
```
    Predicts the topic mixture distribution for a document (often called "theta" in the literature). Returns a vector of zeros for an empty document.
    Note this means to allow quick query for single document. For batch documents, please refer to topicDistributions() to avoid overhead.
    
    Parameters:
    
    document - document to predict topic mixture distributions for
    
    Returns:
    
    topic mixture distribution for the document
  - topicDistributions
```
public JavaPairRDD<Long,Vector> topicDistributions(JavaPairRDD<Long,Vector> documents)
```
    Java-friendly version of topicDistributions
    
    Parameters:
    
    documents - (undocumented)
    
    Returns:
    
    (undocumented)

Class LocalLDAModel

Method Summary

Methods inherited from class org.apache.spark.mllib.clustering.LDAModel

Methods inherited from class Object

Method Detail

load

topics

docConcentration

topicConcentration

k

vocabSize

topicsMatrix

describeTopics

getSeed

setSeed

save

logLikelihood

logLikelihood

logPerplexity

logPerplexity

topicDistributions

topicDistribution

topicDistributions