Package org.apache.spark.ml.clustering
Class PowerIterationClustering
Object
org.apache.spark.ml.clustering.PowerIterationClustering
- All Implemented Interfaces:
Serializable
,PowerIterationClusteringParams
,Params
,HasMaxIter
,HasWeightCol
,DefaultParamsWritable
,Identifiable
,MLWritable
public class PowerIterationClustering
extends Object
implements PowerIterationClusteringParams, DefaultParamsWritable
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
Lin and Cohen. From
the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power
iteration on a normalized pair-wise similarity matrix of the data.
This class is not yet an Estimator/Transformer, use assignClusters
method to run the
PowerIterationClustering algorithm.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionassignClusters
(Dataset<?> dataset) Run the PIC algorithm and returns a cluster assignment for each input vertex.Creates a copy of this instance with the same UID and some extra params.dstCol()
Name of the input column for destination vertex IDs.initMode()
Param for the initialization algorithm.final IntParam
k()
The number of clusters to create (k).static PowerIterationClustering
final IntParam
maxIter()
Param for maximum number of iterations (>= 0).Param<?>[]
params()
Returns all params sorted by their names.static MLReader<T>
read()
setInitMode
(String value) setK
(int value) setMaxIter
(int value) setWeightCol
(String value) srcCol()
Param for the name of the input column for source vertex IDs.uid()
An immutable unique ID for the object and its derivatives.Param for weight column name.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter
Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString
Methods inherited from interface org.apache.spark.ml.util.MLWritable
save
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwn
Methods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParams
getDstCol, getInitMode, getK, getSrcCol
-
Constructor Details
-
PowerIterationClustering
public PowerIterationClustering()
-
-
Method Details
-
load
-
read
-
k
Description copied from interface:PowerIterationClusteringParams
The number of clusters to create (k). Must be > 1. Default: 2.- Specified by:
k
in interfacePowerIterationClusteringParams
- Returns:
- (undocumented)
-
initMode
Description copied from interface:PowerIterationClusteringParams
Param for the initialization algorithm. This can be either "random" to use a random vector as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. Default: random.- Specified by:
initMode
in interfacePowerIterationClusteringParams
- Returns:
- (undocumented)
-
srcCol
Description copied from interface:PowerIterationClusteringParams
Param for the name of the input column for source vertex IDs. Default: "src"- Specified by:
srcCol
in interfacePowerIterationClusteringParams
- Returns:
- (undocumented)
-
dstCol
Description copied from interface:PowerIterationClusteringParams
Name of the input column for destination vertex IDs. Default: "dst"- Specified by:
dstCol
in interfacePowerIterationClusteringParams
- Returns:
- (undocumented)
-
weightCol
Description copied from interface:HasWeightCol
Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.- Specified by:
weightCol
in interfaceHasWeightCol
- Returns:
- (undocumented)
-
maxIter
Description copied from interface:HasMaxIter
Param for maximum number of iterations (>= 0).- Specified by:
maxIter
in interfaceHasMaxIter
- Returns:
- (undocumented)
-
params
Description copied from interface:Params
Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and returnParam
. -
uid
Description copied from interface:Identifiable
An immutable unique ID for the object and its derivatives.- Specified by:
uid
in interfaceIdentifiable
- Returns:
- (undocumented)
-
setK
-
setInitMode
-
setMaxIter
-
setSrcCol
-
setDstCol
-
setWeightCol
-
assignClusters
Run the PIC algorithm and returns a cluster assignment for each input vertex.- Parameters:
dataset
- A dataset with columns src, dst, weight representing the affinity matrix, which is the matrix A in the PIC paper. Suppose the src column value is i, the dst column value is j, the weight column value is similarity s,,ij,, which must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Rows with i = j are ignored, because we assume s,,ij,, = 0.0.- Returns:
- A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: - id: Long - cluster: Int
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
.
-