Package org.apache.spark.ml.clustering
Class DistributedLDAModel
Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Transformer
org.apache.spark.ml.Model<LDAModel>
org.apache.spark.ml.clustering.LDAModel
org.apache.spark.ml.clustering.DistributedLDAModel
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,LDAParams
,Params
,HasCheckpointInterval
,HasFeaturesCol
,HasMaxIter
,HasSeed
,Identifiable
,MLWritable
Distributed model fitted by
LDA
.
This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel()
as a lazy val, but keeping
copy()
cheap.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Method Summary
Modifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.void
Remove any remaining checkpoint files from training.String[]
If using checkpointing andLDA.keepLastCheckpoint
is set to true, then there may be saved checkpoint files.boolean
Indicates whether this instance is of typeDistributedLDAModel
static DistributedLDAModel
double
logPrior()
static MLReader<DistributedLDAModel>
read()
toLocal()
Convert this distributed model to a local representation.toString()
double
write()
Returns anMLWriter
instance for this ML instance.Methods inherited from class org.apache.spark.ml.clustering.LDAModel
checkpointInterval, describeTopics, describeTopics, docConcentration, estimatedDocConcentration, featuresCol, k, keepLastCheckpoint, learningDecay, learningOffset, logLikelihood, logPerplexity, maxIter, optimizeDocConcentration, optimizer, seed, setFeaturesCol, setSeed, setTopicDistributionCol, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSize
Methods inherited from class org.apache.spark.ml.Transformer
transform, transform, transform
Methods inherited from class org.apache.spark.ml.PipelineStage
params
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval
getCheckpointInterval
Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
getFeaturesCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter
Methods inherited from interface org.apache.spark.ml.clustering.LDAParams
getDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, validateAndTransformSchema
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritable
save
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
-
Method Details
-
read
-
load
-
toLocal
Convert this distributed model to a local representation. This discards info about the training dataset.WARNING: This involves collecting a large
LDAModel.topicsMatrix()
to the driver.- Returns:
- (undocumented)
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
. -
isDistributed
public boolean isDistributed()Description copied from class:LDAModel
Indicates whether this instance is of typeDistributedLDAModel
- Specified by:
isDistributed
in classLDAModel
-
trainingLogLikelihood
public double trainingLogLikelihood() -
logPrior
public double logPrior() -
getCheckpointFiles
If using checkpointing andLDA.keepLastCheckpoint
is set to true, then there may be saved checkpoint files. This method is provided so that users can manage those files.Note that removing the checkpoints can cause failures if a partition is lost and is needed by certain
DistributedLDAModel
methods. Reference counting will clean up the checkpoints when this model and derivative data go out of scope.- Returns:
- Checkpoint files from training
-
deleteCheckpointFiles
public void deleteCheckpointFiles()Remove any remaining checkpoint files from training.- See Also:
-
write
Description copied from interface:MLWritable
Returns anMLWriter
instance for this ML instance.- Returns:
- (undocumented)
-
toString
- Specified by:
toString
in interfaceIdentifiable
- Overrides:
toString
in classObject
-