Packages

class PowerIterationClustering extends Serializable

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.

Annotations
@Since( "1.3.0" )
Source
PowerIterationClustering.scala
See also

Spectral clustering (Wikipedia)

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PowerIterationClustering
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PowerIterationClustering()

    Constructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}.

    Constructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}.

    Annotations
    @Since( "1.3.0" )

Value Members

  1. def run(similarities: JavaRDD[(Long, Long, Double)]): PowerIterationClusteringModel

    A Java-friendly version of PowerIterationClustering.run.

    A Java-friendly version of PowerIterationClustering.run.

    Annotations
    @Since( "1.3.0" )
  2. def run(similarities: RDD[(Long, Long, Double)]): PowerIterationClusteringModel

    Run the PIC algorithm.

    Run the PIC algorithm.

    similarities

    an RDD of (i, j, sij) tuples representing the affinity matrix, which is the matrix A in the PIC paper. The similarity sij must be nonnegative. This is a symmetric matrix and hence sij = sji. For any (i, j) with nonzero similarity, there should be either (i, j, sij) or (j, i, sji) in the input. Tuples with i = j are ignored, because we assume sij = 0.0.

    returns

    a PowerIterationClusteringModel that contains the clustering result

    Annotations
    @Since( "1.3.0" )
  3. def run(graph: Graph[Double, Double]): PowerIterationClusteringModel

    Run the PIC algorithm on Graph.

    Run the PIC algorithm on Graph.

    graph

    an affinity matrix represented as graph, which is the matrix A in the PIC paper. The similarity sij represented as the edge between vertices (i, j) must be nonnegative. This is a symmetric matrix and hence sij = sji. For any (i, j) with nonzero similarity, there should be either (i, j, sij) or (j, i, sji) in the input. Tuples with i = j are ignored, because we assume sij = 0.0.

    returns

    a PowerIterationClusteringModel that contains the clustering result

    Annotations
    @Since( "1.5.0" )
  4. def setInitializationMode(mode: String): PowerIterationClustering.this.type

    Set the initialization mode.

    Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.

    Annotations
    @Since( "1.3.0" )
  5. def setK(k: Int): PowerIterationClustering.this.type

    Set the number of clusters.

    Set the number of clusters.

    Annotations
    @Since( "1.3.0" )
  6. def setMaxIterations(maxIterations: Int): PowerIterationClustering.this.type

    Set maximum number of iterations of the power iteration loop

    Set maximum number of iterations of the power iteration loop

    Annotations
    @Since( "1.3.0" )