Class PowerIterationClustering

Object
org.apache.spark.mllib.clustering.PowerIterationClustering
All Implemented Interfaces:
Serializable

public class PowerIterationClustering extends Object implements Serializable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.

param: k Number of clusters. param: maxIterations Maximum number of iterations of the PIC algorithm. param: initMode Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.

See Also:
  • Constructor Details

    • PowerIterationClustering

      public PowerIterationClustering()
      Constructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}.
  • Method Details

    • org$apache$spark$internal$Logging$$log_

      public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
    • org$apache$spark$internal$Logging$$log__$eq

      public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
    • LogStringContext

      public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc)
    • setK

      public PowerIterationClustering setK(int k)
      Set the number of clusters.
      Parameters:
      k - (undocumented)
      Returns:
      (undocumented)
    • setMaxIterations

      public PowerIterationClustering setMaxIterations(int maxIterations)
      Set maximum number of iterations of the power iteration loop
      Parameters:
      maxIterations - (undocumented)
      Returns:
      (undocumented)
    • setInitializationMode

      public PowerIterationClustering setInitializationMode(String mode)
      Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.
      Parameters:
      mode - (undocumented)
      Returns:
      (undocumented)
    • run

      Run the PIC algorithm on Graph.

      Parameters:
      graph - an affinity matrix represented as graph, which is the matrix A in the PIC paper. The similarity s,,ij,, represented as the edge between vertices (i, j) must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we assume s,,ij,, = 0.0.

      Returns:
      a PowerIterationClusteringModel that contains the clustering result
    • run

      public PowerIterationClusteringModel run(RDD<scala.Tuple3<Object,Object,Object>> similarities)
      Run the PIC algorithm.

      Parameters:
      similarities - an RDD of (i, j, s,,ij,,) tuples representing the affinity matrix, which is the matrix A in the PIC paper. The similarity s,,ij,, must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we assume s,,ij,, = 0.0.

      Returns:
      a PowerIterationClusteringModel that contains the clustering result
    • run

      public PowerIterationClusteringModel run(JavaRDD<scala.Tuple3<Long,Long,Double>> similarities)
      A Java-friendly version of PowerIterationClustering.run.
      Parameters:
      similarities - (undocumented)
      Returns:
      (undocumented)