public class DistributedLDAModel extends LDAModel
LDA.
This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel as a lazy val, but keeping
copy() cheap.
| Modifier and Type | Method and Description |
|---|---|
DistributedLDAModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
void |
deleteCheckpointFiles()
Remove any remaining checkpoint files from training.
|
String[] |
getCheckpointFiles()
If using checkpointing and
LDA.keepLastCheckpoint is set to true, then there may be
saved checkpoint files. |
boolean |
isDistributed()
Indicates whether this instance is of type
DistributedLDAModel |
static DistributedLDAModel |
load(String path) |
double |
logPrior() |
static MLReader<DistributedLDAModel> |
read() |
LocalLDAModel |
toLocal()
Convert this distributed model to a local representation.
|
String |
toString() |
double |
trainingLogLikelihood() |
MLWriter |
write()
Returns an
MLWriter instance for this ML instance. |
checkpointInterval, describeTopics, describeTopics, docConcentration, estimatedDocConcentration, featuresCol, k, keepLastCheckpoint, learningDecay, learningOffset, logLikelihood, logPerplexity, maxIter, optimizeDocConcentration, optimizer, seed, setFeaturesCol, setSeed, setTopicDistributionCol, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSizetransform, transform, transformparamsgetDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, validateAndTransformSchemagetFeaturesColgetMaxItergetCheckpointIntervalclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializesavepublic static MLReader<DistributedLDAModel> read()
public static DistributedLDAModel load(String path)
public LocalLDAModel toLocal()
WARNING: This involves collecting a large topicsMatrix to the driver.
public DistributedLDAModel copy(ParamMap extra)
ParamsdefaultCopy().public boolean isDistributed()
LDAModelDistributedLDAModelisDistributed in class LDAModelpublic double trainingLogLikelihood()
public double logPrior()
public String[] getCheckpointFiles()
LDA.keepLastCheckpoint is set to true, then there may be
saved checkpoint files. This method is provided so that users can manage those files.
Note that removing the checkpoints can cause failures if a partition is lost and is needed
by certain DistributedLDAModel methods. Reference counting will clean up the checkpoints
when this model and derivative data go out of scope.
public void deleteCheckpointFiles()
getCheckpointFilespublic MLWriter write()
MLWritableMLWriter instance for this ML instance.public String toString()
toString in interface IdentifiabletoString in class Object