Class SchemaUtils

Object
org.apache.spark.ml.util.SchemaUtils

public class SchemaUtils extends Object
Utils for handling schemas.
  • Constructor Details

    • SchemaUtils

      public SchemaUtils()
  • Method Details

    • checkColumnType

      public static void checkColumnType(StructType schema, String colName, DataType dataType, String msg)
      Check whether the given schema contains a column of the required data type.
      Parameters:
      colName - column name
      dataType - required column data type
      schema - (undocumented)
      msg - (undocumented)
    • checkColumnTypes

      public static void checkColumnTypes(StructType schema, String colName, scala.collection.immutable.Seq<DataType> dataTypes, String msg)
      Check whether the given schema contains a column of one of the require data types.
      Parameters:
      colName - column name
      dataTypes - required column data types
      schema - (undocumented)
      msg - (undocumented)
    • checkNumericType

      public static void checkNumericType(StructType schema, String colName, String msg)
      Check whether the given schema contains a column of the numeric data type.
      Parameters:
      colName - column name
      schema - (undocumented)
      msg - (undocumented)
    • appendColumn

      public static StructType appendColumn(StructType schema, String colName, DataType dataType, boolean nullable)
      Appends a new column to the input schema. This fails if the given output column already exists.
      Parameters:
      schema - input schema
      colName - new column name. If this column name is an empty string "", this method returns the input schema unchanged. This allows users to disable output columns.
      dataType - new column data type
      nullable - (undocumented)
      Returns:
      new schema with the input column appended
    • appendColumn

      public static StructType appendColumn(StructType schema, StructField col)
      Appends a new column to the input schema. This fails if the given output column already exists.
      Parameters:
      schema - input schema
      col - New column schema
      Returns:
      new schema with the input column appended
    • updateAttributeGroupSize

      public static StructType updateAttributeGroupSize(StructType schema, String colName, int size)
      Update the size of a ML Vector column. If this column do not exist, append it.
      Parameters:
      schema - input schema
      colName - column name
      size - number of features
      Returns:
      new schema
    • updateNumValues

      public static StructType updateNumValues(StructType schema, String colName, int numValues)
      Update the number of values of an existing column. If this column do not exist, append it.
      Parameters:
      schema - input schema
      colName - column name
      numValues - number of values.
      Returns:
      new schema
    • updateNumeric

      public static StructType updateNumeric(StructType schema, String colName)
      Update the numeric meta of an existing column. If this column do not exist, append it.
      Parameters:
      schema - input schema
      colName - column name
      Returns:
      new schema
    • updateField

      public static StructType updateField(StructType schema, StructField field, boolean overwriteMetadata)
      Update the metadata of an existing column. If this column do not exist, append it.
      Parameters:
      schema - input schema
      field - struct field
      overwriteMetadata - whether to overwrite the metadata. If true, the metadata in the schema will be overwritten. If false, the metadata in field and schema will be merged to generate output metadata.
      Returns:
      new schema
    • validateVectorCompatibleColumn

      public static void validateVectorCompatibleColumn(StructType schema, String colName)
      Check whether the given column in the schema is one of the supporting vector type: Vector, Array[Float]. Array[Double]
      Parameters:
      schema - input schema
      colName - column name
    • getSchemaField

      public static StructField getSchemaField(StructType schema, String colName)
      Get schema field.
      Parameters:
      schema - input schema
      colName - column name, nested column name is supported.
      Returns:
      (undocumented)
    • getSchemaFieldType

      public static DataType getSchemaFieldType(StructType schema, String colName)
      Get schema field type.
      Parameters:
      schema - input schema
      colName - column name, nested column name is supported.
      Returns:
      (undocumented)