Train Latent Dirichlet Allocation (LDA) model.
New in version 1.5.0.
train(rdd[, k, maxIterations, …])
Train a LDA model.
RDD of documents, which are tuples of document IDs and term
(word) count vectors. The term count vectors are “bags of
words” with a fixed-size vocabulary (where the vocabulary size
is the length of the vector). Document IDs must be unique
and >= 0.
Number of topics to infer, i.e., the number of soft cluster
Maximum number of iterations allowed.
Concentration parameter (commonly named “alpha”) for the prior
placed on documents’ distributions over topics (“theta”).
Concentration parameter (commonly named “beta” or “eta”) for
the prior placed on topics’ distributions over terms.
Random seed for cluster initialization. Set as None to generate
seed based on system time.
Period (in iterations) between checkpoints.
LDAOptimizer used to perform the actual calculation. Currently
“em”, “online” are supported.