Class DistributedLDAModel

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, LDAParams, Params, HasCheckpointInterval, HasFeaturesCol, HasMaxIter, HasSeed, Identifiable, MLWritable

public class DistributedLDAModel extends LDAModel
Distributed model fitted by LDA. This type of model is currently only produced by Expectation-Maximization (EM).

This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.

param: oldLocalModelOption Used to implement oldLocalModel() as a lazy val, but keeping copy() cheap.

See Also:
  • Method Details

    • read

      public static MLReader<DistributedLDAModel> read()
    • load

      public static DistributedLDAModel load(String path)
    • toLocal

      public LocalLDAModel toLocal()
      Convert this distributed model to a local representation. This discards info about the training dataset.

      WARNING: This involves collecting a large LDAModel.topicsMatrix() to the driver.

      Returns:
      (undocumented)
    • copy

      public DistributedLDAModel copy(ParamMap extra)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class Model<LDAModel>
      Parameters:
      extra - (undocumented)
      Returns:
      (undocumented)
    • isDistributed

      public boolean isDistributed()
      Description copied from class: LDAModel
      Indicates whether this instance is of type DistributedLDAModel
      Specified by:
      isDistributed in class LDAModel
    • trainingLogLikelihood

      public double trainingLogLikelihood()
    • logPrior

      public double logPrior()
    • getCheckpointFiles

      public String[] getCheckpointFiles()
      If using checkpointing and LDA.keepLastCheckpoint is set to true, then there may be saved checkpoint files. This method is provided so that users can manage those files.

      Note that removing the checkpoints can cause failures if a partition is lost and is needed by certain DistributedLDAModel methods. Reference counting will clean up the checkpoints when this model and derivative data go out of scope.

      Returns:
      Checkpoint files from training
    • deleteCheckpointFiles

      public void deleteCheckpointFiles()
      Remove any remaining checkpoint files from training.

      See Also:
    • write

      public MLWriter write()
      Description copied from interface: MLWritable
      Returns an MLWriter instance for this ML instance.
      Returns:
      (undocumented)
    • toString

      public String toString()
      Specified by:
      toString in interface Identifiable
      Overrides:
      toString in class Object