pyspark.sql.DataFrame.to¶
-
DataFrame.
to
(schema: pyspark.sql.types.StructType) → pyspark.sql.dataframe.DataFrame[source]¶ Returns a new
DataFrame
where each row is reconciled to match the specified schema.New in version 3.4.0.
- Parameters
- schema
StructType
Specified schema.
- schema
- Returns
DataFrame
Reconciled DataFrame.
Notes
Reorder columns and/or inner fields by name to match the specified schema.
- Project away columns and/or inner fields that are not needed by the specified schema.
Missing columns and/or inner fields (present in the specified schema but not input DataFrame) lead to failures.
- Cast the columns and/or inner fields to match the data types in the specified schema,
if the types are compatible, e.g., numeric to numeric (error if overflows), but not string to int.
- Carry over the metadata from the specified schema, while the columns and/or inner fields
still keep their own metadata if not overwritten by the specified schema.
- Fail if the nullability is not compatible. For example, the column and/or inner field
is nullable but the specified schema requires them to be not nullable.
Supports Spark Connect.
Examples
>>> from pyspark.sql.types import StructField, StringType >>> df = spark.createDataFrame([("a", 1)], ["i", "j"]) >>> df.schema StructType([StructField('i', StringType(), True), StructField('j', LongType(), True)])
>>> schema = StructType([StructField("j", StringType()), StructField("i", StringType())]) >>> df2 = df.to(schema) >>> df2.schema StructType([StructField('j', StringType(), True), StructField('i', StringType(), True)]) >>> df2.show() +---+---+ | j| i| +---+---+ | 1| a| +---+---+