Class PowerIterationClustering
Object
org.apache.spark.mllib.clustering.PowerIterationClustering
- All Implemented Interfaces:
Serializable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
Lin and Cohen.
From the abstract: PIC finds a very low-dimensional embedding of a dataset using
truncated power iteration on a normalized pair-wise similarity matrix of the data.
param: k Number of clusters. param: maxIterations Maximum number of iterations of the PIC algorithm. param: initMode Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Cluster assignment.static class
-
Constructor Summary
ConstructorDescriptionConstructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}. -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.spark.internal.Logging.LogStringContext
LogStringContext
(scala.StringContext sc) static org.slf4j.Logger
static void
org$apache$spark$internal$Logging$$log__$eq
(org.slf4j.Logger x$1) A Java-friendly version ofPowerIterationClustering.run
.Run the PIC algorithm on Graph.Run the PIC algorithm.setInitializationMode
(String mode) Set the initialization mode.setK
(int k) Set the number of clusters.setMaxIterations
(int maxIterations) Set maximum number of iterations of the power iteration loop
-
Constructor Details
-
PowerIterationClustering
public PowerIterationClustering()Constructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}.
-
-
Method Details
-
org$apache$spark$internal$Logging$$log_
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() -
org$apache$spark$internal$Logging$$log__$eq
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) -
LogStringContext
public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) -
setK
Set the number of clusters.- Parameters:
k
- (undocumented)- Returns:
- (undocumented)
-
setMaxIterations
Set maximum number of iterations of the power iteration loop- Parameters:
maxIterations
- (undocumented)- Returns:
- (undocumented)
-
setInitializationMode
Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.- Parameters:
mode
- (undocumented)- Returns:
- (undocumented)
-
run
Run the PIC algorithm on Graph.- Parameters:
graph
- an affinity matrix represented as graph, which is the matrix A in the PIC paper. The similarity s,,ij,, represented as the edge between vertices (i, j) must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we assume s,,ij,, = 0.0.- Returns:
- a
PowerIterationClusteringModel
that contains the clustering result
-
run
Run the PIC algorithm.- Parameters:
similarities
- an RDD of (i, j, s,,ij,,) tuples representing the affinity matrix, which is the matrix A in the PIC paper. The similarity s,,ij,, must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we assume s,,ij,, = 0.0.- Returns:
- a
PowerIterationClusteringModel
that contains the clustering result
-
run
A Java-friendly version ofPowerIterationClustering.run
.- Parameters:
similarities
- (undocumented)- Returns:
- (undocumented)
-