org.apache.spark.ml.clustering.PowerIterationClustering

All Implemented Interfaces:: Serializable, PowerIterationClusteringParams, Params, HasMaxIter, HasWeightCol, DefaultParamsWritable, Identifiable, MLWritable

public class PowerIterationClustering extends Object implements PowerIterationClusteringParams, DefaultParamsWritable

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.

This class is not yet an Estimator/Transformer, use assignClusters method to run the PowerIterationClustering algorithm.

See Also:

Constructor Summary

Constructors

Constructor

Description

PowerIterationClustering()
Method Summary

Modifier and Type

Method

Description

Dataset<Row>

assignClusters(Dataset<?> dataset)

Run the PIC algorithm and returns a cluster assignment for each input vertex.

PowerIterationClustering

copy(ParamMap extra)

Creates a copy of this instance with the same UID and some extra params.

Param<String>

dstCol()

Name of the input column for destination vertex IDs.

final Param<String>

initMode()

Param for the initialization algorithm.

final IntParam

k()

The number of clusters to create (k).

static PowerIterationClustering

load(String path)

final IntParam

maxIter()

Param for maximum number of iterations (>= 0).

Param<?>[]

params()

Returns all params sorted by their names.

static MLReader<T>

read()

PowerIterationClustering

setDstCol(String value)

PowerIterationClustering

setInitMode(String value)

PowerIterationClustering

setK(int value)

PowerIterationClustering

setMaxIter(int value)

PowerIterationClustering

setSrcCol(String value)

PowerIterationClustering

setWeightCol(String value)

Param<String>

srcCol()

Param for the name of the input column for source vertex IDs.

String

uid()

An immutable unique ID for the object and its derivatives.

final Param<String>

weightCol()

Param for weight column name.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParams
getDstCol, getInitMode, getK, getSrcCol

Constructor Details
- PowerIterationClustering
  
  public PowerIterationClustering()
Method Details
- load
  
  public static PowerIterationClustering load(String path)
- read
  
  public static MLReader<T> read()
- k
  
  public final IntParam k()
  
  Description copied from interface: PowerIterationClusteringParams
  
  The number of clusters to create (k). Must be > 1. Default: 2.
  
  Specified by:
  
  k in interface PowerIterationClusteringParams
  
  Returns:
  
  (undocumented)
- initMode
  
  public final Param<String> initMode()
  
  Description copied from interface: PowerIterationClusteringParams
  
  Param for the initialization algorithm. This can be either "random" to use a random vector as vertex properties, or "degree" to use a normalized sum of similarities with other vertices. Default: random.
  
  Specified by:
  
  initMode in interface PowerIterationClusteringParams
  
  Returns:
  
  (undocumented)
- srcCol
  
  public Param<String> srcCol()
  
  Description copied from interface: PowerIterationClusteringParams
  
  Param for the name of the input column for source vertex IDs. Default: "src"
  
  Specified by:
  
  srcCol in interface PowerIterationClusteringParams
  
  Returns:
  
  (undocumented)
- dstCol
  
  public Param<String> dstCol()
  
  Description copied from interface: PowerIterationClusteringParams
  
  Name of the input column for destination vertex IDs. Default: "dst"
  
  Specified by:
  
  dstCol in interface PowerIterationClusteringParams
  
  Returns:
  
  (undocumented)
- weightCol
  
  public final Param<String> weightCol()
  
  Description copied from interface: HasWeightCol
  
  Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
  
  Specified by:
  
  weightCol in interface HasWeightCol
  
  Returns:
  
  (undocumented)
- maxIter
  
  public final IntParam maxIter()
  
  Description copied from interface: HasMaxIter
  
  Param for maximum number of iterations (>= 0).
  
  Specified by:
  
  maxIter in interface HasMaxIter
  
  Returns:
  
  (undocumented)
- params
  
  public Param<?>[] params()
  
  Description copied from interface: Params
  
  Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and return Param.
  
  Specified by:
  
  params in interface Params
  
  Returns:
  
  (undocumented)
- uid
  
  public String uid()
  
  Description copied from interface: Identifiable
  
  An immutable unique ID for the object and its derivatives.
  
  Specified by:
  
  uid in interface Identifiable
  
  Returns:
  
  (undocumented)
- setK
  
  public PowerIterationClustering setK(int value)
- setInitMode
  
  public PowerIterationClustering setInitMode(String value)
- setMaxIter
  
  public PowerIterationClustering setMaxIter(int value)
- setSrcCol
  
  public PowerIterationClustering setSrcCol(String value)
- setDstCol
  
  public PowerIterationClustering setDstCol(String value)
- setWeightCol
  
  public PowerIterationClustering setWeightCol(String value)
- assignClusters
  
  public Dataset<Row> assignClusters(Dataset<?> dataset)
  
  Run the PIC algorithm and returns a cluster assignment for each input vertex.
  
  Parameters:
  
  dataset - A dataset with columns src, dst, weight representing the affinity matrix, which is the matrix A in the PIC paper. Suppose the src column value is i, the dst column value is j, the weight column value is similarity s,,ij,, which must be nonnegative. This is a symmetric matrix and hence s,,ij,, = s,,ji,,. For any (i, j) with nonzero similarity, there should be either (i, j, s,,ij,,) or (j, i, s,,ji,,) in the input. Rows with i = j are ignored, because we assume s,,ij,, = 0.0.
  
  Returns:
  
  A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: - id: Long - cluster: Int
- copy
  
  public PowerIterationClustering copy(ParamMap extra)
  
  Description copied from interface: Params
  
  Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
  
  Specified by:
  
  copy in interface Params
  
  Parameters:
  
  extra - (undocumented)
  
  Returns:
  
  (undocumented)

Class PowerIterationClustering

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.clustering.PowerIterationClusteringParams

Constructor Details

PowerIterationClustering

Method Details

load

read

k

initMode

srcCol

dstCol

weightCol

maxIter

params

uid

setK

setInitMode

setMaxIter

setSrcCol

setDstCol

setWeightCol

assignClusters

copy