abstract class ColumnVector extends AutoCloseable
An interface representing in-memory columnar data in Spark. This interface defines the main APIs to access the data, as well as their batched versions. The batched versions are considered to be faster and preferable whenever possible.
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
Spark only calls specific get method according to the data type of this
ColumnVector,
e.g. if it's int type, Spark is guaranteed to only call #getInt(int) or
int).
ColumnVector supports all the data types including nested types. To handle nested types,
ColumnVector can have children and is a tree structure. Please refer to #getStruct(int),
#getArray(int) and #getMap(int) for the details about how to implement nested
types.
ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
- Annotations
- @Evolving()
- Source
- ColumnVector.java
- Alphabetic
- By Inheritance
- ColumnVector
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
-    new ColumnVector(type: DataType)- Attributes
- protected[vectorized]
 
Abstract Value Members
-   abstract  def close(): UnitCleans up memory for this column vector. Cleans up memory for this column vector. The column vector is not usable after this. This overwrites AutoCloseable#closeto remove thethrowsclause, as column vector is in-memory and we don't expect any exception to happen during closing.- Definition Classes
- ColumnVector → AutoCloseable
- Annotations
- @Override()
 
-   abstract  def getArray(rowId: Int): ColumnarArrayReturns the array type value for rowId.Returns the array type value for rowId. If the slot forrowIdis null, it should return null.To support array type, implementations must construct an ColumnarArrayand return it in this method.ColumnarArrayrequires aColumnVectorthat stores the data of all the elements of all the arrays in this vector, and an offset and length which points to a range in thatColumnVector, and the range represents the array for rowId. Implementations are free to decide where to put the data vector and offsets and lengths. For example, we can use the first child vector as the data vector, and store offsets and lengths in 2 int arrays in this vector.
-   abstract  def getBinary(rowId: Int): Array[Byte]Returns the binary type value for rowId.Returns the binary type value for rowId. If the slot forrowIdis null, it should return null.
-   abstract  def getBoolean(rowId: Int): BooleanReturns the boolean type value for rowId.Returns the boolean type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getByte(rowId: Int): ByteReturns the byte type value for rowId.Returns the byte type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getChild(ordinal: Int): ColumnVector- returns
- child - ColumnVectorat the given ordinal.
 
-   abstract  def getDecimal(rowId: Int, precision: Int, scale: Int): DecimalReturns the decimal type value for rowId.Returns the decimal type value for rowId. If the slot forrowIdis null, it should return null.
-   abstract  def getDouble(rowId: Int): DoubleReturns the double type value for rowId.Returns the double type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getFloat(rowId: Int): FloatReturns the float type value for rowId.Returns the float type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getInt(rowId: Int): IntReturns the int type value for rowId.Returns the int type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getLong(rowId: Int): LongReturns the long type value for rowId.Returns the long type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getMap(ordinal: Int): ColumnarMapReturns the map type value for rowId.Returns the map type value for rowId. If the slot forrowIdis null, it should return null.In Spark, map type value is basically a key data array and a value data array. A key from the key array with a index and a value from the value array with the same index contribute to an entry of this map type value. To support map type, implementations must construct a ColumnarMapand return it in this method.ColumnarMaprequires aColumnVectorthat stores the data of all the keys of all the maps in this vector, and anotherColumnVectorthat stores the data of all the values of all the maps in this vector, and a pair of offset and length which specify the range of the key/value array that belongs to the map type value at rowId.
-   abstract  def getShort(rowId: Int): ShortReturns the short type value for rowId.Returns the short type value for rowId. The return value is undefined and can be anything, if the slot forrowIdis null.
-   abstract  def getUTF8String(rowId: Int): UTF8StringReturns the string type value for rowId.Returns the string type value for rowId. If the slot forrowIdis null, it should return null.Note that the returned UTF8Stringmay point to the data of this column vector, please copy it if you want to keep it after this column vector is freed.
-   abstract  def hasNull(): BooleanReturns true if this column vector contains any null values. 
-   abstract  def isNullAt(rowId: Int): BooleanReturns whether the value at rowIdis NULL.
-   abstract  def numNulls(): IntReturns the number of nulls in this column vector. 
Concrete Value Members
-   final  def !=(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def ##: Int- Definition Classes
- AnyRef → Any
 
-   final  def ==(arg0: Any): Boolean- Definition Classes
- AnyRef → Any
 
-   final  def asInstanceOf[T0]: T0- Definition Classes
- Any
 
-    def clone(): AnyRef- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
 
-    def closeIfFreeable(): UnitCleans up memory for this column vector if it's resources are freeable between batches. Cleans up memory for this column vector if it's resources are freeable between batches. The column vector is not usable after this. If this is a writable column vector or constant column vector, it is a no-op. 
-   final  def dataType(): DataTypeReturns the data type of this column vector. 
-   final  def eq(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-    def equals(arg0: AnyRef): Boolean- Definition Classes
- AnyRef → Any
 
-    def getBooleans(rowId: Int, count: Int): Array[Boolean]Gets boolean type values from [rowId, rowId + count).Gets boolean type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-    def getBytes(rowId: Int, count: Int): Array[Byte]Gets byte type values from [rowId, rowId + count).Gets byte type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-   final  def getClass(): Class[_ <: AnyRef]- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-    def getDoubles(rowId: Int, count: Int): Array[Double]Gets double type values from [rowId, rowId + count).Gets double type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-    def getFloats(rowId: Int, count: Int): Array[Float]Gets float type values from [rowId, rowId + count).Gets float type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-    def getInterval(rowId: Int): CalendarIntervalReturns the calendar interval type value for rowId.Returns the calendar interval type value for rowId. If the slot forrowIdis null, it should return null.In Spark, calendar interval type value is basically two integer values representing the number of months and days in this interval, and a long value representing the number of microseconds in this interval. An interval type vector is the same as a struct type vector with 3 fields: months,daysandmicroseconds.To support interval type, implementations must implement #getChild(int)and define 3 child vectors: the first child vector is an int type vector, containing all the month values of all the interval values in this vector. The second child vector is an int type vector, containing all the day values of all the interval values in this vector. The third child vector is a long type vector, containing all the microsecond values of all the interval values in this vector. Note that the ArrowColumnVector leverages its built-in IntervalMonthDayNanoVector instead of above-mentioned protocol.
-    def getInts(rowId: Int, count: Int): Array[Int]Gets int type values from [rowId, rowId + count).Gets int type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-    def getLongs(rowId: Int, count: Int): Array[Long]Gets long type values from [rowId, rowId + count).Gets long type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-    def getShorts(rowId: Int, count: Int): Array[Short]Gets short type values from [rowId, rowId + count).Gets short type values from [rowId, rowId + count). The return values for the null slots are undefined and can be anything.
-   final  def getStruct(rowId: Int): ColumnarRowReturns the struct type value for rowId.Returns the struct type value for rowId. If the slot forrowIdis null, it should return null.To support struct type, implementations must implement #getChild(int)and make this vector a tree structure. The number of child vectors must be same as the number of fields of the struct type, and each child vector is responsible to store the data for its corresponding struct field.
-   final  def getVariant(rowId: Int): VariantValReturns the Variant value for rowId.Returns the Variant value for rowId. Similar to#getInterval(int), the implementation must implement#getChild(int)and define 2 child vectors of binary type for the Variant value and metadata.
-    def hashCode(): Int- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def isInstanceOf[T0]: Boolean- Definition Classes
- Any
 
-   final  def ne(arg0: AnyRef): Boolean- Definition Classes
- AnyRef
 
-   final  def notify(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def notifyAll(): Unit- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
 
-   final  def synchronized[T0](arg0: => T0): T0- Definition Classes
- AnyRef
 
-    def toString(): String- Definition Classes
- AnyRef → Any
 
-   final  def wait(arg0: Long, arg1: Int): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
-   final  def wait(arg0: Long): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
 
-   final  def wait(): Unit- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
 
Deprecated Value Members
-    def finalize(): Unit- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
- (Since version 9)