Class BlockMatrix
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
,DistributedMatrix
param: blocks The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that
form this distributed matrix. If multiple blocks with the same index exist, the
results for operations like add and multiply will be unpredictable.
param: rowsPerBlock Number of rows that make up each block. The blocks forming the final
rows are not required to have the given number of rows
param: colsPerBlock Number of columns that make up each block. The blocks forming the final
columns are not required to have the given number of columns
param: nRows Number of rows of this matrix. If the supplied value is less than or equal to zero,
the number of rows will be calculated when numRows
is invoked.
param: nCols Number of columns of this matrix. If the supplied value is less than or equal to
zero, the number of columns will be calculated when numCols
is invoked.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionBlockMatrix
(RDD<scala.Tuple2<scala.Tuple2<Object, Object>, Matrix>> blocks, int rowsPerBlock, int colsPerBlock) Alternate constructor for BlockMatrix without the input of the number of rows and columns.BlockMatrix
(RDD<scala.Tuple2<scala.Tuple2<Object, Object>, Matrix>> blocks, int rowsPerBlock, int colsPerBlock, long nRows, long nCols) -
Method Summary
Modifier and TypeMethodDescriptionadd
(BlockMatrix other) Adds the given block matrixother
tothis
block matrix:this + other
.blocks()
cache()
Caches the underlying RDD.int
multiply
(BlockMatrix other) multiply
(BlockMatrix other, int numMidDimSplits) int
long
numCols()
Gets or computes the number of columns.int
long
numRows()
Gets or computes the number of rows.persist
(StorageLevel storageLevel) Persists the underlying RDD with the specified storage level.int
subtract
(BlockMatrix other) Subtracts the given block matrixother
fromthis
block matrix:this - other
.Converts to CoordinateMatrix.Converts to IndexedRowMatrix.Collect the distributed matrix on the driver as aDenseMatrix
.Transpose thisBlockMatrix
.void
validate()
Validates the block matrix info against the matrix data (blocks
) and throws an exception if any error is found.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
BlockMatrix
-
BlockMatrix
public BlockMatrix(RDD<scala.Tuple2<scala.Tuple2<Object, Object>, Matrix>> blocks, int rowsPerBlock, int colsPerBlock) Alternate constructor for BlockMatrix without the input of the number of rows and columns.- Parameters:
blocks
- The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix. If multiple blocks with the same index exist, the results for operations like add and multiply will be unpredictable.rowsPerBlock
- Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rowscolsPerBlock
- Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns
-
-
Method Details
-
add
Adds the given block matrixother
tothis
block matrix:this + other
. The matrices must have the same size and matchingrowsPerBlock
andcolsPerBlock
values. If one of the blocks that are being added are instances ofSparseMatrix
, the resulting sub matrix will also be aSparseMatrix
, even if it is being added to aDenseMatrix
. If two dense matrices are added, the output will also be aDenseMatrix
.- Parameters:
other
- (undocumented)- Returns:
- (undocumented)
-
blocks
-
cache
Caches the underlying RDD. -
colsPerBlock
public int colsPerBlock() -
multiply
Left multiplies thisBlockMatrix
toother
, anotherBlockMatrix
. ThecolsPerBlock
of this matrix must equal therowsPerBlock
ofother
. Ifother
containsSparseMatrix
, they will have to be converted to aDenseMatrix
. The outputBlockMatrix
will only consist of blocks ofDenseMatrix
. This may cause some performance issues until support for multiplying two sparse matrices is added.- Parameters:
other
- (undocumented)- Returns:
- (undocumented)
- Note:
- The behavior of multiply has changed in 1.6.0.
multiply
used to throw an error when there were blocks with duplicate indices. Now, the blocks with duplicate indices will be added with each other.
-
multiply
Left multiplies thisBlockMatrix
toother
, anotherBlockMatrix
. ThecolsPerBlock
of this matrix must equal therowsPerBlock
ofother
. Ifother
containsSparseMatrix
, they will have to be converted to aDenseMatrix
. The outputBlockMatrix
will only consist of blocks ofDenseMatrix
. This may cause some performance issues until support for multiplying two sparse matrices is added. Blocks with duplicate indices will be added with each other.- Parameters:
other
- MatrixB
inA * B = C
numMidDimSplits
- Number of splits to cut on the middle dimension when doing multiplication. For example, when multiplying a MatrixA
of sizem x n
with MatrixB
of sizen x k
, this parameter configures the parallelism to use when grouping the matrices. The parallelism will increase fromm x k
tom x k x numMidDimSplits
, which in some cases also reduces total shuffled data.- Returns:
- (undocumented)
-
numColBlocks
public int numColBlocks() -
numCols
public long numCols()Description copied from interface:DistributedMatrix
Gets or computes the number of columns.- Specified by:
numCols
in interfaceDistributedMatrix
-
numRowBlocks
public int numRowBlocks() -
numRows
public long numRows()Description copied from interface:DistributedMatrix
Gets or computes the number of rows.- Specified by:
numRows
in interfaceDistributedMatrix
-
persist
Persists the underlying RDD with the specified storage level. -
rowsPerBlock
public int rowsPerBlock() -
subtract
Subtracts the given block matrixother
fromthis
block matrix:this - other
. The matrices must have the same size and matchingrowsPerBlock
andcolsPerBlock
values. If one of the blocks that are being subtracted are instances ofSparseMatrix
, the resulting sub matrix will also be aSparseMatrix
, even if it is being subtracted from aDenseMatrix
. If two dense matrices are subtracted, the output will also be aDenseMatrix
.- Parameters:
other
- (undocumented)- Returns:
- (undocumented)
-
toCoordinateMatrix
Converts to CoordinateMatrix. -
toIndexedRowMatrix
Converts to IndexedRowMatrix. The number of columns must be within the integer range. -
toLocalMatrix
Collect the distributed matrix on the driver as aDenseMatrix
.- Returns:
- (undocumented)
-
transpose
Transpose thisBlockMatrix
. Returns a newBlockMatrix
instance sharing the same underlying data. Is a lazy operation.- Returns:
- (undocumented)
-
validate
public void validate()Validates the block matrix info against the matrix data (blocks
) and throws an exception if any error is found.
-