object Correlation
API for correlation functions in MLlib, compatible with DataFrames and Datasets.
The functions in this package generalize the functions in org.apache.spark.sql.Dataset#stat to spark.ml's Vector types.
- Annotations
- @Since("2.2.0")
- Source
- Correlation.scala
- Alphabetic
- By Inheritance
- Correlation
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
-   final  def !=(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def ##: Int- Definition Classes
- AnyRef → Any
 
-   final  def ==(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def asInstanceOf[T0]: T0- Definition Classes
- Any
 
-    def clone(): AnyRef- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
 
-    def corr(dataset: Dataset[_], column: String): DataFrameCompute the Pearson correlation matrix for the input Dataset of Vectors. Compute the Pearson correlation matrix for the input Dataset of Vectors. - Annotations
- @Since("2.2.0")
 
-    def corr(dataset: Dataset[_], column: String, method: String): DataFrameCompute the correlation matrix for the input Dataset of Vectors using the specified method. Compute the correlation matrix for the input Dataset of Vectors using the specified method. Methods currently supported: pearson(default),spearman.- dataset
- A dataset or a dataframe 
- column
- The name of the column of vectors for which the correlation coefficient needs to be computed. This must be a column of the dataset, and it must contain Vector objects. 
- method
- String specifying the method to use for computing correlation. Supported: - pearson(default),- spearman
- returns
- A dataframe that contains the correlation matrix of the column of vectors. This dataframe contains a single row and a single column of name - $METHODNAME($COLUMN).
 - Annotations
- @Since("2.2.0")
- Exceptions thrown
- if the column is not a valid column in the dataset, or if the content of this column is not of type Vector. Here is how to access the correlation coefficient: - val data: Dataset[Vector] = ... val Row(coeff: Matrix) = Correlation.corr(data, "value").head // coeff now contains the Pearson correlation matrix. 
- Note
- For Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input Dataset before calling corr with - method = "spearman"to avoid recomputing the common lineage.
 
-   final  def eq(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-    def equals(arg0: AnyRef): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def getClass(): Class[_ <: AnyRef]- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-    def hashCode(): Int- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def isInstanceOf[T0]: Boolean- Definition Classes
- Any
 
-   final  def ne(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-   final  def notify(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def notifyAll(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def synchronized[T0](arg0: => T0): T0- Definition Classes
- AnyRef
 
-    def toString(): String- Definition Classes
- AnyRef → Any
 
-   final  def wait(arg0: Long, arg1: Int): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
-   final  def wait(arg0: Long): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
 
-   final  def wait(): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
Deprecated Value Members
-    def finalize(): Unit- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
- (Since version 9)