Package org.apache.spark.ml.util
Class SchemaUtils
Object
org.apache.spark.ml.util.SchemaUtils
Utils for handling schemas.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic StructType
appendColumn
(StructType schema, String colName, DataType dataType, boolean nullable) Appends a new column to the input schema.static StructType
appendColumn
(StructType schema, StructField col) Appends a new column to the input schema.static void
checkColumnType
(StructType schema, String colName, DataType dataType, String msg) Check whether the given schema contains a column of the required data type.static void
checkColumnTypes
(StructType schema, String colName, scala.collection.immutable.Seq<DataType> dataTypes, String msg) Check whether the given schema contains a column of one of the require data types.static void
checkNumericType
(StructType schema, String colName, String msg) Check whether the given schema contains a column of the numeric data type.static StructField
getSchemaField
(StructType schema, String colName) Get schema field.static DataType
getSchemaFieldType
(StructType schema, String colName) Get schema field type.static StructType
updateAttributeGroupSize
(StructType schema, String colName, int size) Update the size of a ML Vector column.static StructType
updateField
(StructType schema, StructField field, boolean overwriteMetadata) Update the metadata of an existing column.static StructType
updateNumeric
(StructType schema, String colName) Update the numeric meta of an existing column.static StructType
updateNumValues
(StructType schema, String colName, int numValues) Update the number of values of an existing column.static void
validateVectorCompatibleColumn
(StructType schema, String colName) Check whether the given column in the schema is one of the supporting vector type: Vector, Array[Float].
-
Constructor Details
-
SchemaUtils
public SchemaUtils()
-
-
Method Details
-
checkColumnType
public static void checkColumnType(StructType schema, String colName, DataType dataType, String msg) Check whether the given schema contains a column of the required data type.- Parameters:
colName
- column namedataType
- required column data typeschema
- (undocumented)msg
- (undocumented)
-
checkColumnTypes
public static void checkColumnTypes(StructType schema, String colName, scala.collection.immutable.Seq<DataType> dataTypes, String msg) Check whether the given schema contains a column of one of the require data types.- Parameters:
colName
- column namedataTypes
- required column data typesschema
- (undocumented)msg
- (undocumented)
-
checkNumericType
Check whether the given schema contains a column of the numeric data type.- Parameters:
colName
- column nameschema
- (undocumented)msg
- (undocumented)
-
appendColumn
public static StructType appendColumn(StructType schema, String colName, DataType dataType, boolean nullable) Appends a new column to the input schema. This fails if the given output column already exists.- Parameters:
schema
- input schemacolName
- new column name. If this column name is an empty string "", this method returns the input schema unchanged. This allows users to disable output columns.dataType
- new column data typenullable
- (undocumented)- Returns:
- new schema with the input column appended
-
appendColumn
Appends a new column to the input schema. This fails if the given output column already exists.- Parameters:
schema
- input schemacol
- New column schema- Returns:
- new schema with the input column appended
-
updateAttributeGroupSize
Update the size of a ML Vector column. If this column do not exist, append it.- Parameters:
schema
- input schemacolName
- column namesize
- number of features- Returns:
- new schema
-
updateNumValues
Update the number of values of an existing column. If this column do not exist, append it.- Parameters:
schema
- input schemacolName
- column namenumValues
- number of values.- Returns:
- new schema
-
updateNumeric
Update the numeric meta of an existing column. If this column do not exist, append it.- Parameters:
schema
- input schemacolName
- column name- Returns:
- new schema
-
updateField
public static StructType updateField(StructType schema, StructField field, boolean overwriteMetadata) Update the metadata of an existing column. If this column do not exist, append it.- Parameters:
schema
- input schemafield
- struct fieldoverwriteMetadata
- whether to overwrite the metadata. If true, the metadata in the schema will be overwritten. If false, the metadata infield
andschema
will be merged to generate output metadata.- Returns:
- new schema
-
validateVectorCompatibleColumn
Check whether the given column in the schema is one of the supporting vector type: Vector, Array[Float]. Array[Double]- Parameters:
schema
- input schemacolName
- column name
-
getSchemaField
Get schema field.- Parameters:
schema
- input schemacolName
- column name, nested column name is supported.- Returns:
- (undocumented)
-
getSchemaFieldType
Get schema field type.- Parameters:
schema
- input schemacolName
- column name, nested column name is supported.- Returns:
- (undocumented)
-