Class ColumnVector
- All Implemented Interfaces:
- AutoCloseable
- Direct Known Subclasses:
- ArrowColumnVector
Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values in this ColumnVector.
 Spark only calls specific get method according to the data type of this
 ColumnVector,
 e.g. if it's int type, Spark is guaranteed to only call getInt(int) or
 getInts(int, int).
 
 ColumnVector supports all the data types including nested types. To handle nested types,
 ColumnVector can have children and is a tree structure. Please refer to getStruct(int),
 getArray(int) and getMap(int) for the details about how to implement nested
 types.
 
ColumnVector is expected to be reused during the entire data loading process, to avoid allocating memory again and again.
ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. Implementations should prefer computing efficiency over storage efficiency when design the format. Since it is expected to reuse the ColumnVector instance while loading data, the storage footprint is negligible.
- 
Method SummaryModifier and TypeMethodDescriptionabstract voidclose()Cleans up memory for this column vector.voidCleans up memory for this column vector if it's resources are freeable between batches.final DataTypedataType()Returns the data type of this column vector.abstract ColumnarArraygetArray(int rowId) Returns the array type value forrowId.abstract byte[]getBinary(int rowId) Returns the binary type value forrowId.abstract booleangetBoolean(int rowId) Returns the boolean type value forrowId.boolean[]getBooleans(int rowId, int count) Gets boolean type values from[rowId, rowId + count).abstract bytegetByte(int rowId) Returns the byte type value forrowId.byte[]getBytes(int rowId, int count) Gets byte type values from[rowId, rowId + count).abstract ColumnVectorgetChild(int ordinal) abstract DecimalgetDecimal(int rowId, int precision, int scale) Returns the decimal type value forrowId.abstract doublegetDouble(int rowId) Returns the double type value forrowId.double[]getDoubles(int rowId, int count) Gets double type values from[rowId, rowId + count).abstract floatgetFloat(int rowId) Returns the float type value forrowId.float[]getFloats(int rowId, int count) Gets float type values from[rowId, rowId + count).abstract intgetInt(int rowId) Returns the int type value forrowId.getInterval(int rowId) Returns the calendar interval type value forrowId.int[]getInts(int rowId, int count) Gets int type values from[rowId, rowId + count).abstract longgetLong(int rowId) Returns the long type value forrowId.long[]getLongs(int rowId, int count) Gets long type values from[rowId, rowId + count).abstract ColumnarMapgetMap(int ordinal) Returns the map type value forrowId.abstract shortgetShort(int rowId) Returns the short type value forrowId.short[]getShorts(int rowId, int count) Gets short type values from[rowId, rowId + count).final ColumnarRowgetStruct(int rowId) Returns the struct type value forrowId.abstract org.apache.spark.unsafe.types.UTF8StringgetUTF8String(int rowId) Returns the string type value forrowId.final org.apache.spark.unsafe.types.VariantValgetVariant(int rowId) Returns the Variant value forrowId.abstract booleanhasNull()Returns true if this column vector contains any null values.abstract booleanisNullAt(int rowId) Returns whether the value atrowIdis NULL.abstract intnumNulls()Returns the number of nulls in this column vector.
- 
Method Details- 
dataTypeReturns the data type of this column vector.
- 
closepublic abstract void close()Cleans up memory for this column vector. The column vector is not usable after this.This overwrites AutoCloseable.close()to remove thethrowsclause, as column vector is in-memory and we don't expect any exception to happen during closing.- Specified by:
- closein interface- AutoCloseable
 
- 
closeIfFreeablepublic void closeIfFreeable()Cleans up memory for this column vector if it's resources are freeable between batches. The column vector is not usable after this. If this is a writable column vector or constant column vector, it is a no-op.
- 
hasNullpublic abstract boolean hasNull()Returns true if this column vector contains any null values.
- 
numNullspublic abstract int numNulls()Returns the number of nulls in this column vector.
- 
isNullAtpublic abstract boolean isNullAt(int rowId) Returns whether the value atrowIdis NULL.
- 
getBooleanpublic abstract boolean getBoolean(int rowId) Returns the boolean type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getBooleanspublic boolean[] getBooleans(int rowId, int count) Gets boolean type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getBytepublic abstract byte getByte(int rowId) Returns the byte type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getBytespublic byte[] getBytes(int rowId, int count) Gets byte type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getShortpublic abstract short getShort(int rowId) Returns the short type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getShortspublic short[] getShorts(int rowId, int count) Gets short type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getIntpublic abstract int getInt(int rowId) Returns the int type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getIntspublic int[] getInts(int rowId, int count) Gets int type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getLongpublic abstract long getLong(int rowId) Returns the long type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getLongspublic long[] getLongs(int rowId, int count) Gets long type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getFloatpublic abstract float getFloat(int rowId) Returns the float type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getFloatspublic float[] getFloats(int rowId, int count) Gets float type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getDoublepublic abstract double getDouble(int rowId) Returns the double type value forrowId. The return value is undefined and can be anything, if the slot forrowIdis null.
- 
getDoublespublic double[] getDoubles(int rowId, int count) Gets double type values from[rowId, rowId + count). The return values for the null slots are undefined and can be anything.
- 
getStructReturns the struct type value forrowId. If the slot forrowIdis null, it should return null.To support struct type, implementations must implement getChild(int)and make this vector a tree structure. The number of child vectors must be same as the number of fields of the struct type, and each child vector is responsible to store the data for its corresponding struct field.
- 
getArrayReturns the array type value forrowId. If the slot forrowIdis null, it should return null.To support array type, implementations must construct an ColumnarArrayand return it in this method.ColumnarArrayrequires aColumnVectorthat stores the data of all the elements of all the arrays in this vector, and an offset and length which points to a range in thatColumnVector, and the range represents the array for rowId. Implementations are free to decide where to put the data vector and offsets and lengths. For example, we can use the first child vector as the data vector, and store offsets and lengths in 2 int arrays in this vector.
- 
getMapReturns the map type value forrowId. If the slot forrowIdis null, it should return null.In Spark, map type value is basically a key data array and a value data array. A key from the key array with a index and a value from the value array with the same index contribute to an entry of this map type value. To support map type, implementations must construct a ColumnarMapand return it in this method.ColumnarMaprequires aColumnVectorthat stores the data of all the keys of all the maps in this vector, and anotherColumnVectorthat stores the data of all the values of all the maps in this vector, and a pair of offset and length which specify the range of the key/value array that belongs to the map type value at rowId.
- 
getDecimalReturns the decimal type value forrowId. If the slot forrowIdis null, it should return null.
- 
getUTF8Stringpublic abstract org.apache.spark.unsafe.types.UTF8String getUTF8String(int rowId) Returns the string type value forrowId. If the slot forrowIdis null, it should return null.Note that the returned UTF8Stringmay point to the data of this column vector, please copy it if you want to keep it after this column vector is freed.
- 
getBinarypublic abstract byte[] getBinary(int rowId) Returns the binary type value forrowId. If the slot forrowIdis null, it should return null.
- 
getIntervalReturns the calendar interval type value forrowId. If the slot forrowIdis null, it should return null.In Spark, calendar interval type value is basically two integer values representing the number of months and days in this interval, and a long value representing the number of microseconds in this interval. An interval type vector is the same as a struct type vector with 3 fields: months,daysandmicroseconds.To support interval type, implementations must implement getChild(int)and define 3 child vectors: the first child vector is an int type vector, containing all the month values of all the interval values in this vector. The second child vector is an int type vector, containing all the day values of all the interval values in this vector. The third child vector is a long type vector, containing all the microsecond values of all the interval values in this vector. Note that the ArrowColumnVector leverages its built-in IntervalMonthDayNanoVector instead of above-mentioned protocol.
- 
getVariantpublic final org.apache.spark.unsafe.types.VariantVal getVariant(int rowId) Returns the Variant value forrowId. Similar togetInterval(int), the implementation must implementgetChild(int)and define 2 child vectors of binary type for the Variant value and metadata.
- 
getChild- Returns:
- child ColumnVectorat the given ordinal.
 
 
-