Class BlockMatrix

Object
org.apache.spark.mllib.linalg.distributed.BlockMatrix
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, DistributedMatrix

public class BlockMatrix extends Object implements DistributedMatrix, org.apache.spark.internal.Logging
Represents a distributed matrix in blocks of local matrices.

param: blocks The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix. If multiple blocks with the same index exist, the results for operations like add and multiply will be unpredictable. param: rowsPerBlock Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows param: colsPerBlock Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns param: nRows Number of rows of this matrix. If the supplied value is less than or equal to zero, the number of rows will be calculated when numRows is invoked. param: nCols Number of columns of this matrix. If the supplied value is less than or equal to zero, the number of columns will be calculated when numCols is invoked.

See Also:
  • Constructor Details

    • BlockMatrix

      public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks, int rowsPerBlock, int colsPerBlock, long nRows, long nCols)
    • BlockMatrix

      public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks, int rowsPerBlock, int colsPerBlock)
      Alternate constructor for BlockMatrix without the input of the number of rows and columns.

      Parameters:
      blocks - The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix. If multiple blocks with the same index exist, the results for operations like add and multiply will be unpredictable.
      rowsPerBlock - Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows
      colsPerBlock - Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns
  • Method Details

    • add

      public BlockMatrix add(BlockMatrix other)
      Adds the given block matrix other to this block matrix: this + other. The matrices must have the same size and matching rowsPerBlock and colsPerBlock values. If one of the blocks that are being added are instances of SparseMatrix, the resulting sub matrix will also be a SparseMatrix, even if it is being added to a DenseMatrix. If two dense matrices are added, the output will also be a DenseMatrix.
      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
    • blocks

      public RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()
    • cache

      public BlockMatrix cache()
      Caches the underlying RDD.
    • colsPerBlock

      public int colsPerBlock()
    • multiply

      public BlockMatrix multiply(BlockMatrix other)
      Left multiplies this BlockMatrix to other, another BlockMatrix. The colsPerBlock of this matrix must equal the rowsPerBlock of other. If other contains SparseMatrix, they will have to be converted to a DenseMatrix. The output BlockMatrix will only consist of blocks of DenseMatrix. This may cause some performance issues until support for multiplying two sparse matrices is added.

      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
      Note:
      The behavior of multiply has changed in 1.6.0. multiply used to throw an error when there were blocks with duplicate indices. Now, the blocks with duplicate indices will be added with each other.
    • multiply

      public BlockMatrix multiply(BlockMatrix other, int numMidDimSplits)
      Left multiplies this BlockMatrix to other, another BlockMatrix. The colsPerBlock of this matrix must equal the rowsPerBlock of other. If other contains SparseMatrix, they will have to be converted to a DenseMatrix. The output BlockMatrix will only consist of blocks of DenseMatrix. This may cause some performance issues until support for multiplying two sparse matrices is added. Blocks with duplicate indices will be added with each other.

      Parameters:
      other - Matrix B in A * B = C
      numMidDimSplits - Number of splits to cut on the middle dimension when doing multiplication. For example, when multiplying a Matrix A of size m x n with Matrix B of size n x k, this parameter configures the parallelism to use when grouping the matrices. The parallelism will increase from m x k to m x k x numMidDimSplits, which in some cases also reduces total shuffled data.
      Returns:
      (undocumented)
    • numColBlocks

      public int numColBlocks()
    • numCols

      public long numCols()
      Description copied from interface: DistributedMatrix
      Gets or computes the number of columns.
      Specified by:
      numCols in interface DistributedMatrix
    • numRowBlocks

      public int numRowBlocks()
    • numRows

      public long numRows()
      Description copied from interface: DistributedMatrix
      Gets or computes the number of rows.
      Specified by:
      numRows in interface DistributedMatrix
    • persist

      public BlockMatrix persist(StorageLevel storageLevel)
      Persists the underlying RDD with the specified storage level.
    • rowsPerBlock

      public int rowsPerBlock()
    • subtract

      public BlockMatrix subtract(BlockMatrix other)
      Subtracts the given block matrix other from this block matrix: this - other. The matrices must have the same size and matching rowsPerBlock and colsPerBlock values. If one of the blocks that are being subtracted are instances of SparseMatrix, the resulting sub matrix will also be a SparseMatrix, even if it is being subtracted from a DenseMatrix. If two dense matrices are subtracted, the output will also be a DenseMatrix.
      Parameters:
      other - (undocumented)
      Returns:
      (undocumented)
    • toCoordinateMatrix

      public CoordinateMatrix toCoordinateMatrix()
      Converts to CoordinateMatrix.
    • toIndexedRowMatrix

      public IndexedRowMatrix toIndexedRowMatrix()
      Converts to IndexedRowMatrix. The number of columns must be within the integer range.
    • toLocalMatrix

      public Matrix toLocalMatrix()
      Collect the distributed matrix on the driver as a DenseMatrix.
      Returns:
      (undocumented)
    • transpose

      public BlockMatrix transpose()
      Transpose this BlockMatrix. Returns a new BlockMatrix instance sharing the same underlying data. Is a lazy operation.
      Returns:
      (undocumented)
    • validate

      public void validate()
      Validates the block matrix info against the matrix data (blocks) and throws an exception if any error is found.