public class DistributedLDAModel extends LDAModel
LDA
.
This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel
as a lazy val, but keeping
copy()
cheap.
Modifier and Type | Method and Description |
---|---|
DistributedLDAModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
void |
deleteCheckpointFiles()
Remove any remaining checkpoint files from training.
|
String[] |
getCheckpointFiles()
If using checkpointing and
LDA.keepLastCheckpoint is set to true, then there may be
saved checkpoint files. |
boolean |
isDistributed()
Indicates whether this instance is of type
DistributedLDAModel |
static DistributedLDAModel |
load(String path) |
double |
logPrior() |
static MLReader<DistributedLDAModel> |
read() |
LocalLDAModel |
toLocal()
Convert this distributed model to a local representation.
|
String |
toString() |
double |
trainingLogLikelihood() |
MLWriter |
write()
Returns an
MLWriter instance for this ML instance. |
checkpointInterval, describeTopics, describeTopics, docConcentration, estimatedDocConcentration, featuresCol, k, keepLastCheckpoint, learningDecay, learningOffset, logLikelihood, logPerplexity, maxIter, optimizeDocConcentration, optimizer, seed, setFeaturesCol, setSeed, setTopicDistributionCol, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSize
transform, transform, transform
params
getDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, validateAndTransformSchema
getFeaturesCol
getMaxIter
getCheckpointInterval
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
save
public static MLReader<DistributedLDAModel> read()
public static DistributedLDAModel load(String path)
public LocalLDAModel toLocal()
WARNING: This involves collecting a large topicsMatrix
to the driver.
public DistributedLDAModel copy(ParamMap extra)
Params
defaultCopy()
.public boolean isDistributed()
LDAModel
DistributedLDAModel
isDistributed
in class LDAModel
public double trainingLogLikelihood()
public double logPrior()
public String[] getCheckpointFiles()
LDA.keepLastCheckpoint
is set to true, then there may be
saved checkpoint files. This method is provided so that users can manage those files.
Note that removing the checkpoints can cause failures if a partition is lost and is needed
by certain DistributedLDAModel
methods. Reference counting will clean up the checkpoints
when this model and derivative data go out of scope.
public void deleteCheckpointFiles()
getCheckpointFiles
public MLWriter write()
MLWritable
MLWriter
instance for this ML instance.public String toString()
toString
in interface Identifiable
toString
in class Object