CoordinateMatrix¶

class
pyspark.mllib.linalg.distributed.
CoordinateMatrix
(entries, numRows=0, numCols=0)[source]¶ Represents a matrix in coordinate format.
 Parameters:
 entries
pyspark.RDD
An RDD of MatrixEntry inputs or (int, int, float) tuples.
 numRowsint, optional
Number of rows in the matrix. A nonpositive value means unknown, at which point the number of rows will be determined by the max row index plus one.
 numColsint, optional
Number of columns in the matrix. A nonpositive value means unknown, at which point the number of columns will be determined by the max row index plus one.
 entries
Methods
numCols
()Get or compute the number of cols.
numRows
()Get or compute the number of rows.
toBlockMatrix
([rowsPerBlock, colsPerBlock])Convert this matrix to a BlockMatrix.
Convert this matrix to an IndexedRowMatrix.
Convert this matrix to a RowMatrix.
Transpose this CoordinateMatrix.
Attributes
Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Methods Documentation

numCols
()[source]¶ Get or compute the number of cols.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numCols()) 2
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numCols()) 6

numRows
()[source]¶ Get or compute the number of rows.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numRows()) 3
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numRows()) 7

toBlockMatrix
(rowsPerBlock=1024, colsPerBlock=1024)[source]¶ Convert this matrix to a BlockMatrix.
 Parameters:
 rowsPerBlockint, optional
Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.
 colsPerBlockint, optional
Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toBlockMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # BlockMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # BlockMatrix will have 5 columns as well. >>> print(mat.numCols()) 5

toIndexedRowMatrix
()[source]¶ Convert this matrix to an IndexedRowMatrix.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toIndexedRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # IndexedRowMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # IndexedRowMatrix will have 5 columns as well. >>> print(mat.numCols()) 5

toRowMatrix
()[source]¶ Convert this matrix to a RowMatrix.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, but the ensuing RowMatrix >>> # will only have 2 rows since there are only entries on 2 >>> # unique rows. >>> print(mat.numRows()) 2
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing RowMatrix >>> # will have 5 columns as well. >>> print(mat.numCols()) 5

transpose
()[source]¶ Transpose this CoordinateMatrix.
New in version 2.0.0.
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) >>> mat = CoordinateMatrix(entries) >>> mat_transposed = mat.transpose()
>>> print(mat_transposed.numRows()) 2
>>> print(mat_transposed.numCols()) 3
Attributes Documentation

entries
¶ Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Examples
>>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)])) >>> entries = mat.entries >>> entries.first() MatrixEntry(0, 0, 1.2)