org.apache.spark.sql.vectorized.ColumnarBatch

All Implemented Interfaces:: AutoCloseable

@DeveloperApi public class ColumnarBatch extends Object implements AutoCloseable

This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process. A data source may extend this class with customized logic.

Constructor Summary

Constructors

Constructor

Description

ColumnarBatch(ColumnVector[] columns)

ColumnarBatch(ColumnVector[] columns, int numRows)

Create a new batch from existing column vectors.
Method Summary

Modifier and Type

Method

Description

void

close()

Called to close all the columns in this batch.

void

closeIfFreeable()

Called to close all the columns if their resources are freeable between batches.

ColumnVector

column(int ordinal)

Returns the column at `ordinal`.

org.apache.spark.sql.catalyst.InternalRow

getRow(int rowId)

Returns the row in this batch at `rowId`.

int

numCols()

Returns the number of columns that make up this batch.

int

numRows()

Returns the number of rows for read, including filtered rows.

Iterator<org.apache.spark.sql.catalyst.InternalRow>

rowIterator()

Returns an iterator over the rows in this batch.

void

setNumRows(int numRows)

Sets the number of rows in this batch.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ColumnarBatch
  
  public ColumnarBatch(ColumnVector[] columns)
- ColumnarBatch
  
  public ColumnarBatch(ColumnVector[] columns, int numRows)
  
  Create a new batch from existing column vectors.
  
  Parameters:
  
  columns - The columns of this batch
  
  numRows - The number of rows in this batch
Method Details
- close
  
  public void close()
  
  Called to close all the columns in this batch. It is not valid to access the data after calling this. This must be called at the end to clean up memory allocations.
  
  Specified by:
  
  close in interface AutoCloseable
- closeIfFreeable
  
  public void closeIfFreeable()
  
  Called to close all the columns if their resources are freeable between batches. This is used to clean up memory allocated during columnar processing.
- rowIterator
  
  public Iterator<org.apache.spark.sql.catalyst.InternalRow> rowIterator()
  
  Returns an iterator over the rows in this batch.
- setNumRows
  
  public void setNumRows(int numRows)
  
  Sets the number of rows in this batch.
- numCols
  
  public int numCols()
  
  Returns the number of columns that make up this batch.
- numRows
  
  public int numRows()
  
  Returns the number of rows for read, including filtered rows.
- column
  
  public ColumnVector column(int ordinal)
  
  Returns the column at `ordinal`.
- getRow
  
  public org.apache.spark.sql.catalyst.InternalRow getRow(int rowId)
  
  Returns the row in this batch at `rowId`. Returned row is reused across calls.

Class ColumnarBatch

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

ColumnarBatch

ColumnarBatch

Method Details

close

closeIfFreeable

rowIterator

setNumRows

numCols

numRows

column

getRow