public class Correlation
extends Object
The functions in this package generalize the functions in Dataset.stat()
to spark.ml's Vector types.
Constructor and Description |
---|
Correlation() |
Modifier and Type | Method and Description |
---|---|
static Dataset<Row> |
corr(Dataset<?> dataset,
String column)
Compute the Pearson correlation matrix for the input Dataset of Vectors.
|
static Dataset<Row> |
corr(Dataset<?> dataset,
String column,
String method)
Compute the correlation matrix for the input Dataset of Vectors using the specified method.
|
public static Dataset<Row> corr(Dataset<?> dataset, String column, String method)
pearson
(default), spearman
.
dataset
- A dataset or a dataframecolumn
- The name of the column of vectors for which the correlation coefficient needs
to be computed. This must be a column of the dataset, and it must contain
Vector objects.method
- String specifying the method to use for computing correlation.
Supported: pearson
(default), spearman
$METHODNAME($COLUMN)
.IllegalArgumentException
- if the column is not a valid column in the dataset, or if
the content of this column is not of type Vector.
Here is how to access the correlation coefficient:
val data: Dataset[Vector] = ...
val Row(coeff: Matrix) = Correlation.corr(data, "value").head
// coeff now contains the Pearson correlation matrix.
method = "spearman"
to avoid recomputing the common lineage.