Alternative constructor leaving matrix dimensions to be determined automatically.
rows stored as an RDD[Vector]
number of rows. A non-positive value means unknown, and then the number of rows will
be determined by the number of records in the RDD rows
.
number of columns. A non-positive value means unknown, and then the number of columns will be determined by the size of the first row.
Computes column-wise summary statistics.
Computes the covariance matrix, treating each row as an observation.
a local dense matrix of size n x n
Computes the Gramian matrix A^T A
.
Computes the top k principal components. Rows correspond to observations and columns correspond to variables. The principal components are stored a local matrix of size n-by-k. Each column corresponds for one principal component, and the columns are in descending order of component variance. The row data do not need to be "centered" first; it is not necessary for the mean of each column to be 0.
number of top principal components.
a matrix of size n-by-k, whose columns are principal components
Computes singular value decomposition of this matrix. Denote this matrix by A (m x n). This will compute matrices U, S, V such that A ~= U * S * V', where S contains the leading k singular values, U and V contain the corresponding singular vectors.
At most k largest non-zero singular values and associated vectors are returned. If there are k such values, then the dimensions of the return will be:
We assume n is smaller than m. The singular values and the right singular vectors are derived from the eigenvalues and the eigenvectors of the Gramian matrix A' * A. U, the matrix storing the right singular vectors, is computed via matrix multiplication as U = A * (V * S^{-1}), if requested by user. The actual method to use is determined automatically based on the cost:
Several internal parameters are set to default values. The reciprocal condition number rCond is set to 1e-9. All singular values smaller than rCond * sigma(0) are treated as zeros, where sigma(0) is the largest singular value. The maximum number of Arnoldi update iterations for ARPACK is set to 300 or k * 3, whichever is larger. The numerical tolerance for ARPACK's eigen-decomposition is set to 1e-10.
number of leading singular values to keep (0 < k <= n). It might return less than k if there are numerically zero singular values or there are not enough Ritz values converged before the maximum number of Arnoldi update iterations is reached (in case that matrix A is ill-conditioned).
whether to compute U
the reciprocal condition number. All singular values smaller than rCond * sigma(0) are treated as zero, where sigma(0) is the largest singular value.
SingularValueDecomposition(U, s, V). U = null if computeU = false.
The conditions that decide which method to use internally and the default parameters are subject to change.
Multiply this matrix by a local matrix on the right.
a local matrix whose number of rows must match the number of columns of this matrix
a org.apache.spark.mllib.linalg.distributed.RowMatrix representing the product, which preserves partitioning
Gets or computes the number of columns.
Gets or computes the number of rows.
:: Experimental :: Represents a row-oriented distributed Matrix with no meaningful row indices.