@Evolving
public abstract class ColumnVector
extends Object
implements AutoCloseable
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
 Spark only calls specific get method according to the data type of this
 ColumnVector,
 e.g. if it's int type, Spark is guaranteed to only call getInt(int) or
 getInts(int, int).
 
 ColumnVector supports all the data types including nested types. To handle nested types,
 ColumnVector can have children and is a tree structure. Please refer to getStruct(int),
 getArray(int) and getMap(int) for the details about how to implement nested
 types.
 
ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
| Modifier and Type | Method and Description | 
|---|---|
| abstract void | close()Cleans up memory for this column vector. | 
| void | closeIfFreeable()Cleans up memory for this column vector if it's resources are freeable between batches. | 
| DataType | dataType()Returns the data type of this column vector. | 
| abstract ColumnarArray | getArray(int rowId)Returns the array type value for  rowId. | 
| abstract byte[] | getBinary(int rowId)Returns the binary type value for  rowId. | 
| abstract boolean | getBoolean(int rowId)Returns the boolean type value for  rowId. | 
| boolean[] | getBooleans(int rowId,
           int count)Gets boolean type values from  [rowId, rowId + count). | 
| abstract byte | getByte(int rowId)Returns the byte type value for  rowId. | 
| byte[] | getBytes(int rowId,
        int count)Gets byte type values from  [rowId, rowId + count). | 
| abstract ColumnVector | getChild(int ordinal) | 
| abstract Decimal | getDecimal(int rowId,
          int precision,
          int scale)Returns the decimal type value for  rowId. | 
| abstract double | getDouble(int rowId)Returns the double type value for  rowId. | 
| double[] | getDoubles(int rowId,
          int count)Gets double type values from  [rowId, rowId + count). | 
| abstract float | getFloat(int rowId)Returns the float type value for  rowId. | 
| float[] | getFloats(int rowId,
         int count)Gets float type values from  [rowId, rowId + count). | 
| abstract int | getInt(int rowId)Returns the int type value for  rowId. | 
| CalendarInterval | getInterval(int rowId)Returns the calendar interval type value for  rowId. | 
| int[] | getInts(int rowId,
       int count)Gets int type values from  [rowId, rowId + count). | 
| abstract long | getLong(int rowId)Returns the long type value for  rowId. | 
| long[] | getLongs(int rowId,
        int count)Gets long type values from  [rowId, rowId + count). | 
| abstract ColumnarMap | getMap(int ordinal)Returns the map type value for  rowId. | 
| abstract short | getShort(int rowId)Returns the short type value for  rowId. | 
| short[] | getShorts(int rowId,
         int count)Gets short type values from  [rowId, rowId + count). | 
| ColumnarRow | getStruct(int rowId)Returns the struct type value for  rowId. | 
| abstract org.apache.spark.unsafe.types.UTF8String | getUTF8String(int rowId)Returns the string type value for  rowId. | 
| abstract boolean | hasNull()Returns true if this column vector contains any null values. | 
| abstract boolean | isNullAt(int rowId)Returns whether the value at  rowIdis NULL. | 
| abstract int | numNulls()Returns the number of nulls in this column vector. | 
public final DataType dataType()
public abstract void close()
 This overwrites AutoCloseable.close() to remove the
 throws clause, as column vector is in-memory and we don't expect any exception to
 happen during closing.
close in interface AutoCloseablepublic void closeIfFreeable()
public abstract boolean hasNull()
public abstract int numNulls()
public abstract boolean isNullAt(int rowId)
rowId is NULL.public abstract boolean getBoolean(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public boolean[] getBooleans(int rowId,
                             int count)
[rowId, rowId + count). The return values for the null
 slots are undefined and can be anything.public abstract byte getByte(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public byte[] getBytes(int rowId,
                       int count)
[rowId, rowId + count). The return values for the null slots
 are undefined and can be anything.public abstract short getShort(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public short[] getShorts(int rowId,
                         int count)
[rowId, rowId + count). The return values for the null
 slots are undefined and can be anything.public abstract int getInt(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public int[] getInts(int rowId,
                     int count)
[rowId, rowId + count). The return values for the null slots
 are undefined and can be anything.public abstract long getLong(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public long[] getLongs(int rowId,
                       int count)
[rowId, rowId + count). The return values for the null slots
 are undefined and can be anything.public abstract float getFloat(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public float[] getFloats(int rowId,
                         int count)
[rowId, rowId + count). The return values for the null
 slots are undefined and can be anything.public abstract double getDouble(int rowId)
rowId. The return value is undefined and can be
 anything, if the slot for rowId is null.public double[] getDoubles(int rowId,
                           int count)
[rowId, rowId + count). The return values for the null
 slots are undefined and can be anything.public final ColumnarRow getStruct(int rowId)
rowId. If the slot for rowId is null, it
 should return null.
 
 To support struct type, implementations must implement getChild(int) and make this
 vector a tree structure. The number of child vectors must be same as the number of fields of
 the struct type, and each child vector is responsible to store the data for its corresponding
 struct field.
public abstract ColumnarArray getArray(int rowId)
rowId. If the slot for rowId is null, it
 should return null.
 
 To support array type, implementations must construct an ColumnarArray and return it in
 this method. ColumnarArray requires a ColumnVector that stores the data of all
 the elements of all the arrays in this vector, and an offset and length which points to a range
 in that ColumnVector, and the range represents the array for rowId. Implementations
 are free to decide where to put the data vector and offsets and lengths. For example, we can
 use the first child vector as the data vector, and store offsets and lengths in 2 int arrays in
 this vector.
public abstract ColumnarMap getMap(int ordinal)
rowId. If the slot for rowId is null, it
 should return null.
 In Spark, map type value is basically a key data array and a value data array. A key from the key array with a index and a value from the value array with the same index contribute to an entry of this map type value.
 To support map type, implementations must construct a ColumnarMap and return it in
 this method. ColumnarMap requires a ColumnVector that stores the data of all
 the keys of all the maps in this vector, and another ColumnVector that stores the data
 of all the values of all the maps in this vector, and a pair of offset and length which
 specify the range of the key/value array that belongs to the map type value at rowId.
public abstract Decimal getDecimal(int rowId, int precision, int scale)
rowId. If the slot for rowId is null, it
 should return null.public abstract org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId)
rowId. If the slot for rowId is null, it
 should return null.
 
 Note that the returned UTF8String may point to the data of this column vector,
 please copy it if you want to keep it after this column vector is freed.
public abstract byte[] getBinary(int rowId)
rowId. If the slot for rowId is null, it
 should return null.public final CalendarInterval getInterval(int rowId)
rowId. If the slot for
 rowId is null, it should return null.
 
 In Spark, calendar interval type value is basically two integer values representing the number
 of months and days in this interval, and a long value representing the number of microseconds
 in this interval. An interval type vector is the same as a struct type vector with 3 fields:
 months, days and microseconds.
 
 To support interval type, implementations must implement getChild(int) and define 3
 child vectors: the first child vector is an int type vector, containing all the month values of
 all the interval values in this vector. The second child vector is an int type vector,
 containing all the day values of all the interval values in this vector. The third child vector
 is a long type vector, containing all the microsecond values of all the interval values in this
 vector.
public abstract ColumnVector getChild(int ordinal)
ColumnVector at the given ordinal.