Package org.apache.spark.sql.vectorized
Class ColumnarBatch
Object
org.apache.spark.sql.vectorized.ColumnarBatch
- All Implemented Interfaces:
AutoCloseable
This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this
batch so that Spark can access the data row by row. Instance of it is meant to be reused during
the entire data loading process. A data source may extend this class with customized logic.
-
Constructor Summary
ConstructorDescriptionColumnarBatch
(ColumnVector[] columns) ColumnarBatch
(ColumnVector[] columns, int numRows) Create a new batch from existing column vectors. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Called to close all the columns in this batch.column
(int ordinal) Returns the column at `ordinal`.org.apache.spark.sql.catalyst.InternalRow
getRow
(int rowId) Returns the row in this batch at `rowId`.int
numCols()
Returns the number of columns that make up this batch.int
numRows()
Returns the number of rows for read, including filtered rows.Iterator<org.apache.spark.sql.catalyst.InternalRow>
Returns an iterator over the rows in this batch.void
setNumRows
(int numRows) Sets the number of rows in this batch.
-
Constructor Details
-
ColumnarBatch
-
ColumnarBatch
Create a new batch from existing column vectors.- Parameters:
columns
- The columns of this batchnumRows
- The number of rows in this batch
-
-
Method Details
-
close
public void close()Called to close all the columns in this batch. It is not valid to access the data after calling this. This must be called at the end to clean up memory allocations.- Specified by:
close
in interfaceAutoCloseable
-
rowIterator
Returns an iterator over the rows in this batch. -
setNumRows
public void setNumRows(int numRows) Sets the number of rows in this batch. -
numCols
public int numCols()Returns the number of columns that make up this batch. -
numRows
public int numRows()Returns the number of rows for read, including filtered rows. -
column
Returns the column at `ordinal`. -
getRow
public org.apache.spark.sql.catalyst.InternalRow getRow(int rowId) Returns the row in this batch at `rowId`. Returned row is reused across calls.
-