Class PowerIterationClustering
Object
org.apache.spark.mllib.clustering.PowerIterationClustering
- All Implemented Interfaces:
- Serializable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
 Lin and Cohen.
 From the abstract: PIC finds a very low-dimensional embedding of a dataset using
 truncated power iteration on a normalized pair-wise similarity matrix of the data.
 
param: k Number of clusters. param: maxIterations Maximum number of iterations of the PIC algorithm. param: initMode Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classCluster assignment.static class
- 
Constructor SummaryConstructorsConstructorDescriptionConstructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}.
- 
Method SummaryModifier and TypeMethodDescriptionstatic org.apache.spark.internal.Logging.LogStringContextLogStringContext(scala.StringContext sc) static org.slf4j.Loggerstatic voidorg$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) A Java-friendly version ofPowerIterationClustering.run.Run the PIC algorithm on Graph.Run the PIC algorithm.setInitializationMode(String mode) Set the initialization mode.setK(int k) Set the number of clusters.setMaxIterations(int maxIterations) Set maximum number of iterations of the power iteration loop
- 
Constructor Details- 
PowerIterationClusteringpublic PowerIterationClustering()Constructs a PIC instance with default parameters: {k: 2, maxIterations: 100, initMode: "random"}.
 
- 
- 
Method Details- 
org$apache$spark$internal$Logging$$log_public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
- 
org$apache$spark$internal$Logging$$log__$eqpublic static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) 
- 
LogStringContextpublic static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) 
- 
setKSet the number of clusters.- Parameters:
- k- (undocumented)
- Returns:
- (undocumented)
 
- 
setMaxIterationsSet maximum number of iterations of the power iteration loop- Parameters:
- maxIterations- (undocumented)
- Returns:
- (undocumented)
 
- 
setInitializationModeSet the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.- Parameters:
- mode- (undocumented)
- Returns:
- (undocumented)
 
- 
runRun the PIC algorithm on Graph.- Parameters:
- graph- an affinity matrix represented as graph, which is the matrix A in the PIC paper. The similarity s,,ij,, represented as the edge between vertices (i, j) must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we assume s,,ij,, = 0.0.
- Returns:
- a PowerIterationClusteringModelthat contains the clustering result
 
- 
runRun the PIC algorithm.- Parameters:
- similarities- an RDD of (i, j, s,,ij,,) tuples representing the affinity matrix, which is the matrix A in the PIC paper. The similarity s,,ij,, must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Tuples with i = j are ignored, because we assume s,,ij,, = 0.0.
- Returns:
- a PowerIterationClusteringModelthat contains the clustering result
 
- 
runA Java-friendly version ofPowerIterationClustering.run.- Parameters:
- similarities- (undocumented)
- Returns:
- (undocumented)
 
 
-