Packages

o

org.apache.spark.ml.stat

Correlation

object Correlation

API for correlation functions in MLlib, compatible with DataFrames and Datasets.

The functions in this package generalize the functions in org.apache.spark.sql.Dataset#stat to spark.ml's Vector types.

Annotations
@Since( "2.2.0" )
Source
Correlation.scala
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Correlation
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @IntrinsicCandidate()
  6. def corr(dataset: Dataset[_], column: String): DataFrame

    Compute the Pearson correlation matrix for the input Dataset of Vectors.

    Compute the Pearson correlation matrix for the input Dataset of Vectors.

    Annotations
    @Since( "2.2.0" )
  7. def corr(dataset: Dataset[_], column: String, method: String): DataFrame

    Compute the correlation matrix for the input Dataset of Vectors using the specified method.

    Compute the correlation matrix for the input Dataset of Vectors using the specified method. Methods currently supported: pearson (default), spearman.

    dataset

    A dataset or a dataframe

    column

    The name of the column of vectors for which the correlation coefficient needs to be computed. This must be a column of the dataset, and it must contain Vector objects.

    method

    String specifying the method to use for computing correlation. Supported: pearson (default), spearman

    returns

    A dataframe that contains the correlation matrix of the column of vectors. This dataframe contains a single row and a single column of name $METHODNAME($COLUMN).

    Annotations
    @Since( "2.2.0" )
    Exceptions thrown

    if the column is not a valid column in the dataset, or if the content of this column is not of type Vector. Here is how to access the correlation coefficient:

    val data: Dataset[Vector] = ...
    val Row(coeff: Matrix) = Correlation.corr(data, "value").head
    // coeff now contains the Pearson correlation matrix.
    Note

    For Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input Dataset before calling corr with method = "spearman" to avoid recomputing the common lineage.

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @IntrinsicCandidate()
  11. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @IntrinsicCandidate()
  12. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @IntrinsicCandidate()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @IntrinsicCandidate()
  16. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from AnyRef

Inherited from Any

Members