PowerIterationClustering (Spark 3.4.1 JavaDoc)

Object
- org.apache.spark.ml.clustering.PowerIterationClustering

All Implemented Interfaces:

java.io.Serializable, PowerIterationClusteringParams, Params, HasMaxIter, HasWeightCol, DefaultParamsWritable, Identifiable, MLWritable
```
public class PowerIterationClustering
extends Object
implements PowerIterationClusteringParams, DefaultParamsWritable
```
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.
This class is not yet an Estimator/Transformer, use assignClusters method to run the PowerIterationClustering algorithm.

See Also:

Spectral clustering (Wikipedia), Serialized Form

Constructor Summary

Constructors
Constructor and Description

PowerIterationClustering()

Constructors
Constructor and Description
`PowerIterationClustering()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Dataset<Row>`	`assignClusters(Dataset<?> dataset)` Run the PIC algorithm and returns a cluster assignment for each input vertex.
`PowerIterationClustering`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`Param<String>`	`dstCol()` Name of the input column for destination vertex IDs.
`Param<String>`	`initMode()` Param for the initialization algorithm.
`IntParam`	`k()` The number of clusters to create (k).
`static PowerIterationClustering`	`load(String path)`
`IntParam`	`maxIter()` Param for maximum number of iterations (>= 0).
`Param<?>[]`	`params()` Returns all params sorted by their names.
`static MLReader<T>`	`read()`
`PowerIterationClustering`	`setDstCol(String value)`
`PowerIterationClustering`	`setInitMode(String value)`
`PowerIterationClustering`	`setK(int value)`
`PowerIterationClustering`	`setMaxIter(int value)`
`PowerIterationClustering`	`setSrcCol(String value)`
`PowerIterationClustering`	`setWeightCol(String value)`
`Param<String>`	`srcCol()` Param for the name of the input column for source vertex IDs.
`String`	`uid()` An immutable unique ID for the object and its derivatives.
`Param<String>`	`weightCol()` Param for weight column name.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParams
getDstCol, getInitMode, getK, getSrcCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

- Constructor Detail
  - PowerIterationClustering
```
public PowerIterationClustering()
```
- Method Detail
  - load
```
public static PowerIterationClustering load(String path)
```
  - read
```
public static MLReader<T> read()
```
  - k
```
public final IntParam k()
```
    Description copied from interface: PowerIterationClusteringParams
    
    The number of clusters to create (k). Must be > 1. Default: 2.
    
    Specified by:
    
    k in interface PowerIterationClusteringParams
    
    Returns:
    
    (undocumented)
  - initMode
```
public final Param<String> initMode()
```
    Description copied from interface: PowerIterationClusteringParams
    
    Param for the initialization algorithm. This can be either "random" to use a random vector as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. Default: random.
    
    Specified by:
    
    initMode in interface PowerIterationClusteringParams
    
    Returns:
    
    (undocumented)
  - srcCol
```
public Param<String> srcCol()
```
    Description copied from interface: PowerIterationClusteringParams
    
    Param for the name of the input column for source vertex IDs. Default: "src"
    
    Specified by:
    
    srcCol in interface PowerIterationClusteringParams
    
    Returns:
    
    (undocumented)
  - dstCol
```
public Param<String> dstCol()
```
    Description copied from interface: PowerIterationClusteringParams
    
    Name of the input column for destination vertex IDs. Default: "dst"
    
    Specified by:
    
    dstCol in interface PowerIterationClusteringParams
    
    Returns:
    
    (undocumented)
  - weightCol
```
public final Param<String> weightCol()
```
    Description copied from interface: HasWeightCol
    
    Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
    
    Specified by:
    
    weightCol in interface HasWeightCol
    
    Returns:
    
    (undocumented)
  - maxIter
```
public final IntParam maxIter()
```
    Description copied from interface: HasMaxIter
    
    Param for maximum number of iterations (>= 0).
    
    Specified by:
    
    maxIter in interface HasMaxIter
    
    Returns:
    
    (undocumented)
  - params
```
public Param<?>[] params()
```
    Description copied from interface: Params
    
    Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and return Param.
    
    Specified by:
    
    params in interface Params
    
    Returns:
    
    (undocumented)
  - uid
```
public String uid()
```
    Description copied from interface: Identifiable
    
    An immutable unique ID for the object and its derivatives.
    
    Specified by:
    
    uid in interface Identifiable
    
    Returns:
    
    (undocumented)
  - setK
```
public PowerIterationClustering setK(int value)
```
  - setInitMode
```
public PowerIterationClustering setInitMode(String value)
```
  - setMaxIter
```
public PowerIterationClustering setMaxIter(int value)
```
  - setSrcCol
```
public PowerIterationClustering setSrcCol(String value)
```
  - setDstCol
```
public PowerIterationClustering setDstCol(String value)
```
  - setWeightCol
```
public PowerIterationClustering setWeightCol(String value)
```
  - assignClusters
```
public Dataset<Row> assignClusters(Dataset<?> dataset)
```
    Run the PIC algorithm and returns a cluster assignment for each input vertex.
    
    Parameters:
    
    dataset - A dataset with columns src, dst, weight representing the affinity matrix, which is the matrix A in the PIC paper. Suppose the src column value is i, the dst column value is j, the weight column value is similarity s,,ij,, which must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Rows with i = j are ignored, because we assume s,,ij,, = 0.0.
    
    Returns:
    
    A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: - id: Long - cluster: Int
  - copy
```
public PowerIterationClustering copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
    
    Specified by:
    
    copy in interface Params
    
    Parameters:
    
    extra - (undocumented)
    
    Returns:
    
    (undocumented)

Class PowerIterationClustering

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParams

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Constructor Detail

PowerIterationClustering

Method Detail

load

read

k

initMode

srcCol

dstCol

weightCol

maxIter

params

uid

setK

setInitMode

setMaxIter

setSrcCol

setDstCol

setWeightCol

assignClusters

copy