vectorized

package vectorized

Type Members

class ArrowColumnVector extends ColumnVector
A column vector backed by Apache Arrow.
A column vector backed by Apache Arrow.
Annotations
@DeveloperApi()
abstract class ColumnVector extends AutoCloseable
An interface representing in-memory columnar data in Spark.
An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
Spark only calls specific get method according to the data type of this ColumnVector, e.g. if it's int type, Spark is guaranteed to only call #getInt(int) or int).
ColumnVector supports all the data types including nested types. To handle nested types, ColumnVector can have children and is a tree structure. Please refer to #getStruct(int), #getArray(int) and #getMap(int) for the details about how to implement nested types.
ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
Annotations
@Evolving()
final class ColumnarArray extends ArrayData
Array abstraction in ColumnVector.
Array abstraction in ColumnVector.
Annotations
@Evolving()
class ColumnarBatch extends AutoCloseable
This class wraps multiple ColumnVectors as a row-wise table.
This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process. A data source may extend this class with customized logic.
Annotations
@DeveloperApi()
final class ColumnarBatchRow extends InternalRow
This class wraps an array of ColumnVector and provides a row view.
This class wraps an array of ColumnVector and provides a row view.
Annotations
@DeveloperApi()
Since
3.3.0
final class ColumnarMap extends MapData
Map abstraction in ColumnVector.
final class ColumnarRow extends InternalRow
Row abstraction in ColumnVector.
Row abstraction in ColumnVector.
Annotations
@Evolving()

Ungrouped

class ArrowColumnVector extends ColumnVector
A column vector backed by Apache Arrow.
A column vector backed by Apache Arrow.
Annotations
@DeveloperApi()
abstract class ColumnVector extends AutoCloseable
An interface representing in-memory columnar data in Spark.
An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
Spark only calls specific get method according to the data type of this ColumnVector, e.g. if it's int type, Spark is guaranteed to only call #getInt(int) or int).
ColumnVector supports all the data types including nested types. To handle nested types, ColumnVector can have children and is a tree structure. Please refer to #getStruct(int), #getArray(int) and #getMap(int) for the details about how to implement nested types.
ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
Annotations
@Evolving()
final class ColumnarArray extends ArrayData
Array abstraction in ColumnVector.
Array abstraction in ColumnVector.
Annotations
@Evolving()
class ColumnarBatch extends AutoCloseable
This class wraps multiple ColumnVectors as a row-wise table.
This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this batch so that Spark can access the data row by row. Instance of it is meant to be reused during the entire data loading process. A data source may extend this class with customized logic.
Annotations
@DeveloperApi()
final class ColumnarBatchRow extends InternalRow
This class wraps an array of ColumnVector and provides a row view.
This class wraps an array of ColumnVector and provides a row view.
Annotations
@DeveloperApi()
Since
3.3.0
final class ColumnarMap extends MapData
Map abstraction in ColumnVector.
final class ColumnarRow extends InternalRow
Row abstraction in ColumnVector.
Row abstraction in ColumnVector.
Annotations
@Evolving()

Packages

vectorized

package vectorized

Type Members

Ungrouped

vectorized