Object

org.apache.spark.ml.stat

Correlation

Related Doc: package stat

Permalink

object Correlation

API for correlation functions in MLlib, compatible with DataFrames and Datasets.

The functions in this package generalize the functions in org.apache.spark.sql.Dataset#stat to spark.ml's Vector types.

Annotations
@Since( "2.2.0" ) @Experimental()
Source
Correlation.scala
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Correlation
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def corr(dataset: Dataset[_], column: String): DataFrame

    Permalink

    Compute the Pearson correlation matrix for the input Dataset of Vectors.

    Compute the Pearson correlation matrix for the input Dataset of Vectors.

    Annotations
    @Since( "2.2.0" )
  7. def corr(dataset: Dataset[_], column: String, method: String): DataFrame

    Permalink

    :: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method.

    :: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method. Methods currently supported: pearson (default), spearman.

    dataset

    A dataset or a dataframe

    column

    The name of the column of vectors for which the correlation coefficient needs to be computed. This must be a column of the dataset, and it must contain Vector objects.

    method

    String specifying the method to use for computing correlation. Supported: pearson (default), spearman

    returns

    A dataframe that contains the correlation matrix of the column of vectors. This dataframe contains a single row and a single column of name '$METHODNAME($COLUMN)'.

    Annotations
    @Since( "2.2.0" )
    Exceptions thrown

    if the column is not a valid column in the dataset, or if the content of this column is not of type Vector. Here is how to access the correlation coefficient:

    val data: Dataset[Vector] = ...
    val Row(coeff: Matrix) = Correlation.corr(data, "value").head
    // coeff now contains the Pearson correlation matrix.
    Note

    For Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input Dataset before calling corr with method = "spearman" to avoid recomputing the common lineage.

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  18. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  19. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Members