Upgrading from PySpark 3.2 to 3.3ΒΆ

  • In Spark 3.3, the pyspark.pandas.sql method follows [the standard Python string formatter](https://docs.python.org/3/library/string.html#format-string-syntax). To restore the previous behavior, set PYSPARK_PANDAS_SQL_LEGACY environment variable to 1.

  • In Spark 3.3, the drop method of pandas API on Spark DataFrame supports dropping rows by index, and sets dropping by index instead of column by default.

  • In Spark 3.3, PySpark upgrades Pandas version, the new minimum required version changes from 0.23.2 to 1.0.5.

  • In Spark 3.3, the repr return values of SQL DataTypes have been changed to yield an object with the same value when passed to eval.