package vectorized
Type Members
- class ArrowColumnVector extends ColumnVector
A column vector backed by Apache Arrow.
A column vector backed by Apache Arrow.
- Annotations
- @DeveloperApi()
- abstract class ColumnVector extends AutoCloseable
An interface representing in-memory columnar data in Spark.
An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
Spark only calls specific
get
method according to the data type of thisColumnVector
, e.g. if it's int type, Spark is guaranteed to only call#getInt(int)
orint)
.ColumnVector supports all the data types including nested types. To handle nested types, ColumnVector can have children and is a tree structure. Please refer to
#getStruct(int)
,#getArray(int)
and#getMap(int)
for the details about how to implement nested types.ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
- Annotations
- @Evolving()
- final class ColumnarArray extends ArrayData
Array abstraction in
ColumnVector
.Array abstraction in
ColumnVector
.- Annotations
- @Evolving()
- class ColumnarBatch extends AutoCloseable
This class wraps multiple ColumnVectors as a row-wise table.
This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process. A data source may extend this class with customized logic.
- Annotations
- @DeveloperApi()
- final class ColumnarBatchRow extends InternalRow
This class wraps an array of
ColumnVector
and provides a row view.This class wraps an array of
ColumnVector
and provides a row view.- Annotations
- @DeveloperApi()
- Since
3.3.0
- final class ColumnarMap extends MapData
Map abstraction in
ColumnVector
. - final class ColumnarRow extends InternalRow
Row abstraction in
ColumnVector
.Row abstraction in
ColumnVector
.- Annotations
- @Evolving()