Class IndexedRowMatrix

Object
org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
All Implemented Interfaces:
Serializable, DistributedMatrix

public class IndexedRowMatrix extends Object implements DistributedMatrix
Represents a row-oriented DistributedMatrix with indexed rows.

param: rows indexed rows of this matrix param: nRows number of rows. A non-positive value means unknown, and then the number of rows will be determined by the max row index plus one. param: nCols number of columns. A non-positive value means unknown, and then the number of columns will be determined by the size of the first row.

See Also:
  • Constructor Details

    • IndexedRowMatrix

      public IndexedRowMatrix(RDD<IndexedRow> rows, long nRows, int nCols)
    • IndexedRowMatrix

      public IndexedRowMatrix(RDD<IndexedRow> rows)
      Alternative constructor leaving matrix dimensions to be determined automatically.
  • Method Details

    • columnSimilarities

      public CoordinateMatrix columnSimilarities()
      Compute all cosine similarities between columns of this matrix using the brute-force approach of computing normalized dot products.

      Returns:
      An n x n sparse upper-triangular matrix of cosine similarities between columns of this matrix.
    • computeGramianMatrix

      public Matrix computeGramianMatrix()
      Computes the Gramian matrix A^T A.

      Returns:
      (undocumented)
      Note:
      This cannot be computed on matrices with more than 65535 columns.
    • computeSVD

      public SingularValueDecomposition<IndexedRowMatrix,Matrix> computeSVD(int k, boolean computeU, double rCond)
      Computes the singular value decomposition of this IndexedRowMatrix. Denote this matrix by A (m x n), this will compute matrices U, S, V such that A = U * S * V'.

      The cost and implementation of this method is identical to that in RowMatrix With the addition of indices.

      At most k largest non-zero singular values and associated vectors are returned. If there are k such values, then the dimensions of the return will be:

      U is an IndexedRowMatrix of size m x k that satisfies U'U = eye(k), s is a Vector of size k, holding the singular values in descending order, and V is a local Matrix of size n x k that satisfies V'V = eye(k).

      Parameters:
      k - number of singular values to keep. We might return less than k if there are numerically zero singular values. See rCond.
      computeU - whether to compute U
      rCond - the reciprocal condition number. All singular values smaller than rCond * sigma(0) are treated as zero, where sigma(0) is the largest singular value.
      Returns:
      SingularValueDecomposition(U, s, V)
    • multiply

      public IndexedRowMatrix multiply(Matrix B)
      Multiply this matrix by a local matrix on the right.

      Parameters:
      B - a local matrix whose number of rows must match the number of columns of this matrix
      Returns:
      an IndexedRowMatrix representing the product, which preserves partitioning
    • numCols

      public long numCols()
      Description copied from interface: DistributedMatrix
      Gets or computes the number of columns.
      Specified by:
      numCols in interface DistributedMatrix
    • numRows

      public long numRows()
      Description copied from interface: DistributedMatrix
      Gets or computes the number of rows.
      Specified by:
      numRows in interface DistributedMatrix
    • rows

      public RDD<IndexedRow> rows()
    • toBlockMatrix

      public BlockMatrix toBlockMatrix()
      Converts to BlockMatrix. Creates blocks with size 1024 x 1024.
      Returns:
      (undocumented)
    • toBlockMatrix

      public BlockMatrix toBlockMatrix(int rowsPerBlock, int colsPerBlock)
      Converts to BlockMatrix. Blocks may be sparse or dense depending on the sparsity of the rows.
      Parameters:
      rowsPerBlock - The number of rows of each block. The blocks at the bottom edge may have a smaller value. Must be an integer value greater than 0.
      colsPerBlock - The number of columns of each block. The blocks at the right edge may have a smaller value. Must be an integer value greater than 0.
      Returns:
      a BlockMatrix
    • toCoordinateMatrix

      public CoordinateMatrix toCoordinateMatrix()
      Converts this matrix to a CoordinateMatrix.
      Returns:
      (undocumented)
    • toRowMatrix

      public RowMatrix toRowMatrix()
      Drops row indices and converts this matrix to a RowMatrix.
      Returns:
      (undocumented)