Compute the Pearson correlation matrix for the input Dataset of Vectors.
:: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method.
A dataset or a dataframe
The name of the column of vectors for which the correlation coefficient needs to be computed. This must be a column of the dataset, and it must contain Vector objects.
String specifying the method to use for computing correlation.
A dataframe that contains the correlation matrix of the column of vectors. This dataframe contains a single row and a single column of name '$METHODNAME($COLUMN)'.
if the column is not a valid column in the dataset, or if the content of this column is not of type Vector. Here is how to access the correlation coefficient:
val data: Dataset[Vector] = ... val Row(coeff: Matrix) = Correlation.corr(data, "value").head // coeff now contains the Pearson correlation matrix.
For Spearman, a rank correlation, we need to create an RDD[Double] for each column
and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector],
which is fairly costly. Cache the input Dataset before calling corr with
method = "spearman"
to avoid recomputing the common lineage.