Package org.apache.spark.sql.util
Class SchemaUtils
Object
org.apache.spark.sql.util.SchemaUtils
Utils for handling schemas.
 
 TODO: Merge this file with SchemaUtils.
- 
Nested Class SummaryNested Classes
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionstatic voidcheckColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis) Checks if input column names have duplicate identifiers.static voidcheckColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String, String, Object> resolver) Checks if input column names have duplicate identifiers.static voidThrows an error if the given schema has indeterminate collation.static voidcheckNoCollationsInMapKeys(DataType schema) static voidcheckSchemaColumnNameDuplication(DataType schema, boolean caseSensitiveAnalysis) Checks if an input schema has duplicate column names.static voidcheckSchemaColumnNameDuplication(StructType schema, scala.Function2<String, String, Object> resolver) Checks if an input schema has duplicate column names.static voidcheckTransformDuplication(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive) Checks if the partitioning transforms are being duplicated or not.static Stringstatic scala.collection.immutable.Seq<String>explodeNestedFieldNames(StructType schema) Returns all column names in this schema as a flat list.static scala.collection.immutable.Seq<org.apache.spark.sql.util.SchemaUtils.ColumnPath>findColumnPaths(DataType dt, scala.Function1<DataType, Object> f) For the given dataTypedtfind all column paths that satisfy the given predicatef.static scala.collection.immutable.Seq<Object>findColumnPosition(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String, String, Object> resolver) Returns the given column's ordinal within the givenschema.static scala.collection.immutable.Seq<String>getColumnName(scala.collection.immutable.Seq<Object> position, StructType schema) Gets the name of the column in the given position.static booleanChecks if a given data type has indeterminate collation.static booleanChecks if a given data type has a non utf8 binary (implicit) collation type.static DataTypeReplaces any collated string type with non collated StringType recursively in the given data type.static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression>restoreOriginalOutputNames(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames) 
- 
Constructor Details- 
SchemaUtilspublic SchemaUtils()
 
- 
- 
Method Details- 
findColumnPathspublic static scala.collection.immutable.Seq<org.apache.spark.sql.util.SchemaUtils.ColumnPath> findColumnPaths(DataType dt, scala.Function1<DataType, Object> f) For the given dataTypedtfind all column paths that satisfy the given predicatef.- Parameters:
- dt- (undocumented)
- f- (undocumented)
- Returns:
- (undocumented)
 
- 
checkSchemaColumnNameDuplicationChecks if an input schema has duplicate column names. This throws an exception if the duplication exists.- Parameters:
- schema- schema to check
- caseSensitiveAnalysis- whether duplication checks should be case sensitive or not
 
- 
checkSchemaColumnNameDuplicationpublic static void checkSchemaColumnNameDuplication(StructType schema, scala.Function2<String, String, Object> resolver) Checks if an input schema has duplicate column names. This throws an exception if the duplication exists.- Parameters:
- schema- schema to check
- resolver- resolver used to determine if two identifiers are equal
 
- 
checkColumnNameDuplicationpublic static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, scala.Function2<String, String, Object> resolver) Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.- Parameters:
- columnNames- column names to check
- resolver- resolver used to determine if two identifiers are equal
 
- 
checkColumnNameDuplicationpublic static void checkColumnNameDuplication(scala.collection.immutable.Seq<String> columnNames, boolean caseSensitiveAnalysis) Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.- Parameters:
- columnNames- column names to check
- caseSensitiveAnalysis- whether duplication checks should be case sensitive or not
 
- 
explodeNestedFieldNamesReturns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"- Parameters:
- schema- (undocumented)
- Returns:
- (undocumented)
 
- 
checkTransformDuplicationpublic static void checkTransformDuplication(scala.collection.immutable.Seq<Transform> transforms, String checkType, boolean isCaseSensitive) Checks if the partitioning transforms are being duplicated or not. Throws an exception if duplication exists.- Parameters:
- transforms- the schema to check for duplicates
- checkType- contextual information around the check, used in an exception message
- isCaseSensitive- Whether to be case sensitive when comparing column names
 
- 
findColumnPositionpublic static scala.collection.immutable.Seq<Object> findColumnPosition(scala.collection.immutable.Seq<String> column, StructType schema, scala.Function2<String, String, Object> resolver) Returns the given column's ordinal within the givenschema. The length of the returned position will be as long as how nested the column is.- Parameters:
- column- The column to search for in the given struct. If the length of- columnis greater than 1, we expect to enter a nested field.
- schema- The current struct we are looking at.
- resolver- The resolver to find the column.
- Returns:
- (undocumented)
 
- 
getColumnNamepublic static scala.collection.immutable.Seq<String> getColumnName(scala.collection.immutable.Seq<Object> position, StructType schema) Gets the name of the column in the given position.- Parameters:
- position- (undocumented)
- schema- (undocumented)
- Returns:
- (undocumented)
 
- 
restoreOriginalOutputNamespublic static scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> restoreOriginalOutputNames(scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.expressions.NamedExpression> projectList, scala.collection.immutable.Seq<String> originalNames) 
- 
escapeMetaCharacters- Parameters:
- str- The string to be escaped.
- Returns:
- The escaped string.
 
- 
hasNonUTF8BinaryCollationChecks if a given data type has a non utf8 binary (implicit) collation type.- Parameters:
- dt- (undocumented)
- Returns:
- (undocumented)
 
- 
hasIndeterminateCollationChecks if a given data type has indeterminate collation.
- 
checkIndeterminateCollationInSchemaThrows an error if the given schema has indeterminate collation.- Parameters:
- schema- (undocumented)
 
- 
checkNoCollationsInMapKeys
- 
replaceCollatedStringWithStringReplaces any collated string type with non collated StringType recursively in the given data type.- Parameters:
- dt- (undocumented)
- Returns:
- (undocumented)
 
 
-