Package org.apache.spark.ml.clustering
Class PowerIterationClustering
Object
org.apache.spark.ml.clustering.PowerIterationClustering
- All Implemented Interfaces:
Serializable,PowerIterationClusteringParams,Params,HasMaxIter,HasWeightCol,DefaultParamsWritable,Identifiable,MLWritable
public class PowerIterationClustering
extends Object
implements PowerIterationClusteringParams, DefaultParamsWritable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
Lin and Cohen. From
the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power
iteration on a normalized pair-wise similarity matrix of the data.
This class is not yet an Estimator/Transformer, use assignClusters method to run the
PowerIterationClustering algorithm.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionassignClusters(Dataset<?> dataset) Run the PIC algorithm and returns a cluster assignment for each input vertex.Creates a copy of this instance with the same UID and some extra params.dstCol()Name of the input column for destination vertex IDs.initMode()Param for the initialization algorithm.final IntParamk()The number of clusters to create (k).static PowerIterationClusteringfinal IntParammaxIter()Param for maximum number of iterations (>= 0).Param<?>[]params()Returns all params sorted by their names.static MLReader<T>read()setInitMode(String value) setK(int value) setMaxIter(int value) setWeightCol(String value) srcCol()Param for the name of the input column for source vertex IDs.uid()An immutable unique ID for the object and its derivatives.Param for weight column name.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
writeMethods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIterMethods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightColMethods inherited from interface org.apache.spark.ml.util.Identifiable
toStringMethods inherited from interface org.apache.spark.ml.util.MLWritable
saveMethods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParams
getDstCol, getInitMode, getK, getSrcCol
-
Constructor Details
-
PowerIterationClustering
public PowerIterationClustering()
-
-
Method Details
-
load
-
read
-
k
Description copied from interface:PowerIterationClusteringParamsThe number of clusters to create (k). Must be > 1. Default: 2.- Specified by:
kin interfacePowerIterationClusteringParams- Returns:
- (undocumented)
-
initMode
Description copied from interface:PowerIterationClusteringParamsParam for the initialization algorithm. This can be either "random" to use a random vector as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. Default: random.- Specified by:
initModein interfacePowerIterationClusteringParams- Returns:
- (undocumented)
-
srcCol
Description copied from interface:PowerIterationClusteringParamsParam for the name of the input column for source vertex IDs. Default: "src"- Specified by:
srcColin interfacePowerIterationClusteringParams- Returns:
- (undocumented)
-
dstCol
Description copied from interface:PowerIterationClusteringParamsName of the input column for destination vertex IDs. Default: "dst"- Specified by:
dstColin interfacePowerIterationClusteringParams- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightColParam for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightColin interfaceHasWeightCol- Returns:
- (undocumented)
-
maxIter
Description copied from interface:HasMaxIterParam for maximum number of iterations (>= 0).- Specified by:
maxIterin interfaceHasMaxIter- Returns:
- (undocumented)
-
params
Description copied from interface:ParamsReturns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and returnParam. -
uid
Description copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
uidin interfaceIdentifiable- Returns:
- (undocumented)
-
setK
-
setInitMode
-
setMaxIter
-
setSrcCol
-
setDstCol
-
setWeightCol
-
assignClusters
Run the PIC algorithm and returns a cluster assignment for each input vertex.- Parameters:
dataset- A dataset with columns src, dst, weight representing the affinity matrix, which is the matrix A in the PIC paper. Suppose the src column value is i, the dst column value is j, the weight column value is similarity s,,ij,, which must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Rows with i = j are ignored, because we assume s,,ij,, = 0.0.- Returns:
- A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: - id: Long - cluster: Int
-
copy
Description copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().
-