Spark Release 3.2.3

Spark 3.2.3 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We strongly recommend all 3.2 users to upgrade to this stable release.

Notable changes

  • [SPARK-38697]: Extend SparkSessionExtensions to inject rules into AQE Optimizer
  • [SPARK-39200]: Stream is corrupted Exception while fetching the blocks from fallback storage system
  • [SPARK-8731]: Beeline doesn’t work with -e option when started in background
  • [SPARK-32380]: sparksql cannot access hive table while data in hbase
  • [SPARK-35542]: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.
  • [SPARK-39184]: ArrayIndexOutOfBoundsException for some date/time sequences in some time-zones
  • [SPARK-39647]: Block push fails with java.lang.IllegalArgumentException: Active local dirs list has not been updated by any executor registration even when the NodeManager hasn’t been restarted
  • [SPARK-39775]: Regression due to AVRO-2035
  • [SPARK-39833]: Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
  • [SPARK-39835]: Fix EliminateSorts remove global sort below the local sort
  • [SPARK-39839]: Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check
  • [SPARK-39847]: Race condition related to interruption of task threads while they are in RocksDBLoader.loadLibrary()
  • [SPARK-39867]: Global limit should not inherit OrderPreservingUnaryNode
  • [SPARK-39887]: Expression transform error
  • [SPARK-39900]: Issue with querying dataframe produced by ‘binaryFile’ format using ‘not’ operator
  • [SPARK-39932]: WindowExec should clear the final partition buffer
  • [SPARK-39952]: SaveIntoDataSourceCommand should recache result relation
  • [SPARK-39962]: Global aggregation against pandas aggregate UDF does not take the column order into account
  • [SPARK-39965]: Skip PVC cleanup when driver doesn’t own PVCs
  • [SPARK-39972]: Revert the test case of SPARK-39962 in branch-3.2 and branch-3.1
  • [SPARK-40002]: Limit improperly pushed down through window using ntile function
  • [SPARK-40065]: Executor ConfigMap is not mounted if profile is not default
  • [SPARK-40079]: Add Imputer inputCols validation for empty input case
  • [SPARK-40089]: Sorting of at least Decimal(20, 2) fails for some values near the max.
  • [SPARK-40117]: Convert condition to java in DataFrameWriterV2.overwrite
  • [SPARK-40121]: Initialize projection used for Python UDF
  • [SPARK-40124]: Update TPCDS v1.4 q32 for Plan Stability tests
  • [SPARK-40149]: Star expansion after outer join asymmetrically includes joining key
  • [SPARK-40169]: Fix the issue with Parquet column index and predicate pushdown in Data source V1
  • [SPARK-40212]: SparkSQL castPartValue does not properly handle byte & short
  • [SPARK-40218]: GROUPING SETS should preserve the grouping columns
  • [SPARK-40270]: Make compute.max_rows as None working in
  • [SPARK-40280]: Failure to create parquet predicate push down for ints and longs on some valid files
  • [SPARK-40315]: Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects
  • [SPARK-40407]: Repartition of DataFrame can result in severe data skew in some special case
  • [SPARK-40459]: recoverDiskStore should not stop by existing recomputed files
  • [SPARK-40470]: arrays_zip output unexpected alias column names when using GetMapValue and GetArrayStructFields
  • [SPARK-40493]: Revert “[SPARK-33861][SQL] Simplify conditional in predicate”
  • [SPARK-40562]: Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
  • [SPARK-40583]: Documentation error in “Integration with Cloud Infrastructures”
  • [SPARK-40588]: Sorting issue with partitioned-writing and AQE turned on
  • [SPARK-40612]: On Kubernetes for long running app Spark using an invalid principal to renew the delegation token
  • [SPARK-40636]: Fix wrong remained shuffles log in BlockManagerDecommissioner
  • [SPARK-40660]: Switch to XORShiftRandom to distribute elements
  • [SPARK-40829]: STORED AS serde in CREATE TABLE LIKE view does not work
  • [SPARK-40851]: TimestampFormatter behavior changed when using the latest Java 8/11/17
  • [SPARK-40869]: KubernetesConf.getResourceNamePrefix creates invalid name prefixes
  • [SPARK-40874]: Fix broadcasts in Python UDFs when encryption is enabled
  • [SPARK-40902]: Quick submission of drivers in tests to mesos scheduler results in dropping drivers
  • [SPARK-40963]: ExtractGenerator sets incorrect nullability in new Project
  • [SPARK-41035]: Incorrect results or NPE when a literal is reused across distinct aggregations
  • [SPARK-41091]: Fix Docker release tool for branch-3.2
  • [SPARK-41188]: Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes
  • [SPARK-38034]: Optimize time complexity and extend applicable cases for TransposeWindow
  • [SPARK-39831]: R dependencies installation start to fail after devtools_2.4.4 was released
  • [SPARK-39879]: Reduce local-cluster memory configuration in BroadcastJoinSuite* and HiveSparkSubmitSuite
  • [SPARK-40022]: YarnClusterSuite should not ABORTED when there is no Python3 environment
  • [SPARK-40241]: Correct the link of GenericUDTF
  • [SPARK-40490]: YarnShuffleIntegrationSuite no longer verifies registeredExecFile reload after SPARK-17321
  • [SPARK-40574]: Add PURGE to DROP TABLE doc
  • [SPARK-40172]: Temporarily disable flaky test cases in ImageFileFormatSuite
  • [SPARK-40461]: Set upperbound for pyzmq 24.0.0 for Python linter
  • [SPARK-40213]: Incorrect ASCII value for Latin-1 Supplement characters
  • [SPARK-40292]: arrays_zip output unexpected alias column names
  • [SPARK-40043]: Document DataStreamWriter.toTable and DataStreamReader.table
  • [SPARK-40983]: Remove Hadoop requirements for zstd mention in Parquet compression codec

Dependency Changes

While being a maintence release we did still upgrade some dependencies in this release they are:

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.

Spark News Archive