org.apache.spark.mllib.linalg.distributed
Class BlockMatrix

Object
  extended by org.apache.spark.mllib.linalg.distributed.BlockMatrix
All Implemented Interfaces:
java.io.Serializable, Logging, DistributedMatrix

public class BlockMatrix
extends Object
implements DistributedMatrix, Logging

:: Experimental ::

Represents a distributed matrix in blocks of local matrices.

param: blocks The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix. If multiple blocks with the same index exist, the results for operations like add and multiply will be unpredictable. param: rowsPerBlock Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows param: colsPerBlock Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns param: nRows Number of rows of this matrix. If the supplied value is less than or equal to zero, the number of rows will be calculated when numRows is invoked. param: nCols Number of columns of this matrix. If the supplied value is less than or equal to zero, the number of columns will be calculated when numCols is invoked.

See Also:
Serialized Form

Constructor Summary
BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks, int rowsPerBlock, int colsPerBlock)
          Alternate constructor for BlockMatrix without the input of the number of rows and columns.
BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks, int rowsPerBlock, int colsPerBlock, long nRows, long nCols)
           
 
Method Summary
 BlockMatrix add(BlockMatrix other)
          Adds two block matrices together.
 RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()
           
 BlockMatrix cache()
          Caches the underlying RDD.
 int colsPerBlock()
           
 BlockMatrix multiply(BlockMatrix other)
          Left multiplies this BlockMatrix to other, another BlockMatrix.
 int numColBlocks()
           
 long numCols()
          Gets or computes the number of columns.
 int numRowBlocks()
           
 long numRows()
          Gets or computes the number of rows.
 BlockMatrix persist(StorageLevel storageLevel)
          Persists the underlying RDD with the specified storage level.
 int rowsPerBlock()
           
 CoordinateMatrix toCoordinateMatrix()
          Converts to CoordinateMatrix.
 IndexedRowMatrix toIndexedRowMatrix()
          Converts to IndexedRowMatrix.
 Matrix toLocalMatrix()
          Collect the distributed matrix on the driver as a `DenseMatrix`.
 BlockMatrix transpose()
          Transpose this BlockMatrix.
 void validate()
          Validates the block matrix info against the matrix data (blocks) and throws an exception if any error is found.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

BlockMatrix

public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks,
                   int rowsPerBlock,
                   int colsPerBlock,
                   long nRows,
                   long nCols)

BlockMatrix

public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks,
                   int rowsPerBlock,
                   int colsPerBlock)
Alternate constructor for BlockMatrix without the input of the number of rows and columns.

Parameters:
blocks - The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix. If multiple blocks with the same index exist, the results for operations like add and multiply will be unpredictable.
rowsPerBlock - Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows
colsPerBlock - Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns
Method Detail

blocks

public RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()

rowsPerBlock

public int rowsPerBlock()

colsPerBlock

public int colsPerBlock()

numRows

public long numRows()
Description copied from interface: DistributedMatrix
Gets or computes the number of rows.

Specified by:
numRows in interface DistributedMatrix

numCols

public long numCols()
Description copied from interface: DistributedMatrix
Gets or computes the number of columns.

Specified by:
numCols in interface DistributedMatrix

numRowBlocks

public int numRowBlocks()

numColBlocks

public int numColBlocks()

validate

public void validate()
Validates the block matrix info against the matrix data (blocks) and throws an exception if any error is found.


cache

public BlockMatrix cache()
Caches the underlying RDD.


persist

public BlockMatrix persist(StorageLevel storageLevel)
Persists the underlying RDD with the specified storage level.


toCoordinateMatrix

public CoordinateMatrix toCoordinateMatrix()
Converts to CoordinateMatrix.


toIndexedRowMatrix

public IndexedRowMatrix toIndexedRowMatrix()
Converts to IndexedRowMatrix. The number of columns must be within the integer range.


toLocalMatrix

public Matrix toLocalMatrix()
Collect the distributed matrix on the driver as a `DenseMatrix`.


transpose

public BlockMatrix transpose()
Transpose this BlockMatrix. Returns a new BlockMatrix instance sharing the

Returns:
(undocumented) same underlying data. Is a lazy operation.

add

public BlockMatrix add(BlockMatrix other)
Adds two block matrices together. The matrices must have the same size and matching rowsPerBlock and colsPerBlock values. If one of the blocks that are being added are instances of SparseMatrix, the resulting sub matrix will also be a SparseMatrix, even if it is being added to a DenseMatrix. If two dense matrices are added, the output will also be a DenseMatrix.

Parameters:
other - (undocumented)
Returns:
(undocumented)

multiply

public BlockMatrix multiply(BlockMatrix other)
Left multiplies this BlockMatrix to other, another BlockMatrix. The colsPerBlock of this matrix must equal the rowsPerBlock of other. If other contains SparseMatrix, they will have to be converted to a DenseMatrix. The output BlockMatrix will only consist of blocks of DenseMatrix. This may cause some performance issues until support for multiplying two sparse matrices is added.

Parameters:
other - (undocumented)
Returns:
(undocumented)