Class SchemaInferenceUtils
Object
org.apache.spark.sql.pipelines.util.SchemaInferenceUtils
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic scala.collection.immutable.Seq<TableChange>
diffSchemas
(StructType currentSchema, StructType targetSchema) Determines the column changes needed to transform the current schema into the target schema.static StructType
inferSchemaFromFlows
(scala.collection.immutable.Seq<ResolvedFlow> flows, scala.Option<StructType> userSpecifiedSchema) Given a set of flows that write to the same destination and possibly a user-specified schema, we infer the schema of the destination dataset.
-
Constructor Details
-
SchemaInferenceUtils
public SchemaInferenceUtils()
-
-
Method Details
-
inferSchemaFromFlows
public static StructType inferSchemaFromFlows(scala.collection.immutable.Seq<ResolvedFlow> flows, scala.Option<StructType> userSpecifiedSchema) Given a set of flows that write to the same destination and possibly a user-specified schema, we infer the schema of the destination dataset. The logic is as follows: 1. If there are no incoming flows, return the user-specified schema (if provided) or an empty schema. 2. If there are incoming flows, we merge the schemas of all flows that write to the same destination. 3. If a user-specified schema is provided, we merge it with the inferred schema. The user-specified schema will take precedence over the inferred schema. Returns an error if encountered during schema inference or merging the inferred schema with the user-specified one.- Parameters:
flows
- (undocumented)userSpecifiedSchema
- (undocumented)- Returns:
- (undocumented)
-
diffSchemas
public static scala.collection.immutable.Seq<TableChange> diffSchemas(StructType currentSchema, StructType targetSchema) Determines the column changes needed to transform the current schema into the target schema.This function compares the current schema with the target schema and produces a sequence of TableChange objects representing: 1. New columns that need to be added 2. Existing columns that need type updates
- Parameters:
currentSchema
- The current schema of the tabletargetSchema
- The target schema that we want the table to have- Returns:
- A sequence of TableChange objects representing the necessary changes
-