Package org.apache.spark.ml.clustering
Class PowerIterationClustering
Object
org.apache.spark.ml.clustering.PowerIterationClustering
- All Implemented Interfaces:
- Serializable,- PowerIterationClusteringParams,- Params,- HasMaxIter,- HasWeightCol,- DefaultParamsWritable,- Identifiable,- MLWritable
public class PowerIterationClustering
extends Object
implements PowerIterationClusteringParams, DefaultParamsWritable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
 Lin and Cohen. From
 the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power
 iteration on a normalized pair-wise similarity matrix of the data.
 
 This class is not yet an Estimator/Transformer, use assignClusters method to run the
 PowerIterationClustering algorithm.
 
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionassignClusters(Dataset<?> dataset) Run the PIC algorithm and returns a cluster assignment for each input vertex.Creates a copy of this instance with the same UID and some extra params.dstCol()Name of the input column for destination vertex IDs.initMode()Param for the initialization algorithm.final IntParamk()The number of clusters to create (k).static PowerIterationClusteringfinal IntParammaxIter()Param for maximum number of iterations (>= 0).Param<?>[]params()Returns all params sorted by their names.static MLReader<T>read()setInitMode(String value) setK(int value) setMaxIter(int value) setWeightCol(String value) srcCol()Param for the name of the input column for source vertex IDs.uid()An immutable unique ID for the object and its derivatives.Param for weight column name.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.DefaultParamsWritablewriteMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxItergetMaxIterMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightColgetWeightColMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoStringMethods inherited from interface org.apache.spark.ml.util.MLWritablesaveMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParamsgetDstCol, getInitMode, getK, getSrcCol
- 
Constructor Details- 
PowerIterationClusteringpublic PowerIterationClustering()
 
- 
- 
Method Details- 
load
- 
read
- 
kDescription copied from interface:PowerIterationClusteringParamsThe number of clusters to create (k). Must be > 1. Default: 2.- Specified by:
- kin interface- PowerIterationClusteringParams
- Returns:
- (undocumented)
 
- 
initModeDescription copied from interface:PowerIterationClusteringParamsParam for the initialization algorithm. This can be either "random" to use a random vector as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. Default: random.- Specified by:
- initModein interface- PowerIterationClusteringParams
- Returns:
- (undocumented)
 
- 
srcColDescription copied from interface:PowerIterationClusteringParamsParam for the name of the input column for source vertex IDs. Default: "src"- Specified by:
- srcColin interface- PowerIterationClusteringParams
- Returns:
- (undocumented)
 
- 
dstColDescription copied from interface:PowerIterationClusteringParamsName of the input column for destination vertex IDs. Default: "dst"- Specified by:
- dstColin interface- PowerIterationClusteringParams
- Returns:
- (undocumented)
 
- 
weightColDescription copied from interface:HasWeightColParam for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
- weightColin interface- HasWeightCol
- Returns:
- (undocumented)
 
- 
maxIterDescription copied from interface:HasMaxIterParam for maximum number of iterations (>= 0).- Specified by:
- maxIterin interface- HasMaxIter
- Returns:
- (undocumented)
 
- 
paramsDescription copied from interface:ParamsReturns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and returnParam.
- 
uidDescription copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
- uidin interface- Identifiable
- Returns:
- (undocumented)
 
- 
setK
- 
setInitMode
- 
setMaxIter
- 
setSrcCol
- 
setDstCol
- 
setWeightCol
- 
assignClustersRun the PIC algorithm and returns a cluster assignment for each input vertex.- Parameters:
- dataset- A dataset with columns src, dst, weight representing the affinity matrix, which is the matrix A in the PIC paper. Suppose the src column value is i, the dst column value is j, the weight column value is similarity s,,ij,, which must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Rows with i = j are ignored, because we assume s,,ij,, = 0.0.
- Returns:
- A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: - id: Long - cluster: Int
 
- 
copyDescription copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().
 
-