Class SchemaInferenceUtils

Object
org.apache.spark.sql.pipelines.util.SchemaInferenceUtils

public class SchemaInferenceUtils extends Object
  • Constructor Details

    • SchemaInferenceUtils

      public SchemaInferenceUtils()
  • Method Details

    • inferSchemaFromFlows

      public static StructType inferSchemaFromFlows(scala.collection.immutable.Seq<ResolvedFlow> flows, scala.Option<StructType> userSpecifiedSchema)
      Given a set of flows that write to the same destination and possibly a user-specified schema, we infer the schema of the destination dataset. The logic is as follows: 1. If there are no incoming flows, return the user-specified schema (if provided) or an empty schema. 2. If there are incoming flows, we merge the schemas of all flows that write to the same destination. 3. If a user-specified schema is provided, we merge it with the inferred schema. The user-specified schema will take precedence over the inferred schema. Returns an error if encountered during schema inference or merging the inferred schema with the user-specified one.
      Parameters:
      flows - (undocumented)
      userSpecifiedSchema - (undocumented)
      Returns:
      (undocumented)
    • diffSchemas

      public static scala.collection.immutable.Seq<TableChange> diffSchemas(StructType currentSchema, StructType targetSchema)
      Determines the column changes needed to transform the current schema into the target schema.

      This function compares the current schema with the target schema and produces a sequence of TableChange objects representing: 1. New columns that need to be added 2. Existing columns that need type updates

      Parameters:
      currentSchema - The current schema of the table
      targetSchema - The target schema that we want the table to have
      Returns:
      A sequence of TableChange objects representing the necessary changes