org.apache.spark.mllib.clustering
Return the topics described by weighted terms.
This limits the number of terms per topic. This is approximate; it may not return exactly the top-weighted terms for each topic. To get a more precise set of top terms, increase maxTermsPerTopic.
Maximum number of terms to collect for each topic.
Array over topics. Each topic is represented as a pair of matching arrays: (term indices, term weights in topic). Each topic's terms are sorted in order of decreasing weight.
WARNING: If vocabSize and k are large, this can return a large object!
Number of topics
Inferred topics, where each topic is represented by a distribution over terms.
Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics.
Vocabulary size (number of terms or terms in the vocabulary)
:: Experimental ::
Local LDA model. This model stores only the inferred topics. It may be used for computing topics for new documents, but it may give less accurate answers than the DistributedLDAModel.