Package org.apache.spark.ml.clustering
Class DistributedLDAModel
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Transformer
org.apache.spark.ml.Model<LDAModel>
org.apache.spark.ml.clustering.LDAModel
org.apache.spark.ml.clustering.DistributedLDAModel
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging,- LDAParams,- Params,- HasCheckpointInterval,- HasFeaturesCol,- HasMaxIter,- HasSeed,- Identifiable,- MLWritable
Distributed model fitted by 
LDA.
 This type of model is currently only produced by Expectation-Maximization (EM).
 This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
 param:  oldLocalModelOption  Used to implement oldLocalModel() as a lazy val, but keeping
                             copy() cheap.
- See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Method SummaryModifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.voidRemove any remaining checkpoint files from training.longString[]If using checkpointing andLDA.keepLastCheckpointis set to true, then there may be saved checkpoint files.booleanIndicates whether this instance is of typeDistributedLDAModelstatic DistributedLDAModeldoublelogPrior()static MLReader<DistributedLDAModel>read()toLocal()Convert this distributed model to a local representation.toString()doublewrite()Returns anMLWriterinstance for this ML instance.Methods inherited from class org.apache.spark.ml.clustering.LDAModelcheckpointInterval, describeTopics, describeTopics, docConcentration, estimatedDocConcentration, featuresCol, k, keepLastCheckpoint, learningDecay, learningOffset, logLikelihood, logPerplexity, maxIter, optimizeDocConcentration, optimizer, seed, setFeaturesCol, setSeed, setTopicDistributionCol, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSizeMethods inherited from class org.apache.spark.ml.Transformertransform, transform, transformMethods inherited from class org.apache.spark.ml.PipelineStageparamsMethods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointIntervalgetCheckpointIntervalMethods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesColgetFeaturesColMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxItergetMaxIterMethods inherited from interface org.apache.spark.ml.clustering.LDAParamsgetDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, validateAndTransformSchemaMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContextMethods inherited from interface org.apache.spark.ml.util.MLWritablesaveMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
- 
Method Details- 
read
- 
load
- 
toLocalConvert this distributed model to a local representation. This discards info about the training dataset.WARNING: This involves collecting a large LDAModel.topicsMatrix()to the driver.- Returns:
- (undocumented)
 
- 
copyDescription copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().
- 
isDistributedpublic boolean isDistributed()Description copied from class:LDAModelIndicates whether this instance is of typeDistributedLDAModel- Specified by:
- isDistributedin class- LDAModel
 
- 
trainingLogLikelihoodpublic double trainingLogLikelihood()
- 
logPriorpublic double logPrior()
- 
getCheckpointFilesIf using checkpointing andLDA.keepLastCheckpointis set to true, then there may be saved checkpoint files. This method is provided so that users can manage those files.Note that removing the checkpoints can cause failures if a partition is lost and is needed by certain DistributedLDAModelmethods. Reference counting will clean up the checkpoints when this model and derivative data go out of scope.- Returns:
- Checkpoint files from training
 
- 
deleteCheckpointFilespublic void deleteCheckpointFiles()Remove any remaining checkpoint files from training.- See Also:
 
- 
writeDescription copied from interface:MLWritableReturns anMLWriterinstance for this ML instance.- Returns:
- (undocumented)
 
- 
toString- Specified by:
- toStringin interface- Identifiable
- Overrides:
- toStringin class- Object
 
- 
estimatedSizepublic long estimatedSize()
 
-