Class SchemaUtils

Object
org.apache.spark.sql.util.SchemaUtils

public class SchemaUtils extends Object
Utils for handling schemas.

TODO: Merge this file with SchemaUtils.

  • Constructor Details

    • SchemaUtils

      public SchemaUtils()
  • Method Details

    • checkSchemaColumnNameDuplication

      public static void checkSchemaColumnNameDuplication(DataType schema, boolean caseSensitiveAnalysis)
      Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.

      Parameters:
      schema - schema to check
      caseSensitiveAnalysis - whether duplication checks should be case sensitive or not
    • checkSchemaColumnNameDuplication

      public static void checkSchemaColumnNameDuplication(StructType schema, scala.Function2<String,String,Object> resolver)
      Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.

      Parameters:
      schema - schema to check
      resolver - resolver used to determine if two identifiers are equal
    • checkColumnNameDuplication

      public static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String,String,Object> resolver)
      Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.

      Parameters:
      columnNames - column names to check
      resolver - resolver used to determine if two identifiers are equal
    • checkColumnNameDuplication

      public static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis)
      Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.

      Parameters:
      columnNames - column names to check
      caseSensitiveAnalysis - whether duplication checks should be case sensitive or not
    • explodeNestedFieldNames

      public static scala.collection.immutable.Seq<String> explodeNestedFieldNames(StructType schema)
      Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"
      Parameters:
      schema - (undocumented)
      Returns:
      (undocumented)
    • checkTransformDuplication

      public static void checkTransformDuplication(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive)
      Checks if the partitioning transforms are being duplicated or not. Throws an exception if duplication exists.

      Parameters:
      transforms - the schema to check for duplicates
      checkType - contextual information around the check, used in an exception message
      isCaseSensitive - Whether to be case sensitive when comparing column names
    • findColumnPosition

      public static scala.collection.immutable.Seq<Object> findColumnPosition(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String,String,Object> resolver)
      Returns the given column's ordinal within the given schema. The length of the returned position will be as long as how nested the column is.

      Parameters:
      column - The column to search for in the given struct. If the length of column is greater than 1, we expect to enter a nested field.
      schema - The current struct we are looking at.
      resolver - The resolver to find the column.
      Returns:
      (undocumented)
    • getColumnName

      public static scala.collection.immutable.Seq<String> getColumnName(scala.collection.immutable.Seq<Object> position, StructType schema)
      Gets the name of the column in the given position.
      Parameters:
      position - (undocumented)
      schema - (undocumented)
      Returns:
      (undocumented)
    • restoreOriginalOutputNames

      public static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> restoreOriginalOutputNames(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames)
    • escapeMetaCharacters

      public static String escapeMetaCharacters(String str)
      Parameters:
      str - The string to be escaped.
      Returns:
      The escaped string.
    • hasNonUTF8BinaryCollation

      public static boolean hasNonUTF8BinaryCollation(DataType dt)
      Checks if a given data type has a non utf8 binary (implicit) collation type.
      Parameters:
      dt - (undocumented)
      Returns:
      (undocumented)
    • replaceCollatedStringWithString

      public static DataType replaceCollatedStringWithString(DataType dt)
      Replaces any collated string type with non collated StringType recursively in the given data type.
      Parameters:
      dt - (undocumented)
      Returns:
      (undocumented)