Spark Release 3.2.3

Spark 3.2.3 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We strongly recommend all 3.2 users to upgrade to this stable release.

Notable changes

[SPARK-38697]: Extend SparkSessionExtensions to inject rules into AQE Optimizer
[SPARK-39200]: Stream is corrupted Exception while fetching the blocks from fallback storage system
[SPARK-8731]: Beeline doesn’t work with -e option when started in background
[SPARK-32380]: sparksql cannot access hive table while data in hbase
[SPARK-35542]: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.
[SPARK-39184]: ArrayIndexOutOfBoundsException for some date/time sequences in some time-zones
[SPARK-39647]: Block push fails with java.lang.IllegalArgumentException: Active local dirs list has not been updated by any executor registration even when the NodeManager hasn’t been restarted
[SPARK-39775]: Regression due to AVRO-2035
[SPARK-39833]: Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
[SPARK-39835]: Fix EliminateSorts remove global sort below the local sort
[SPARK-39839]: Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check
[SPARK-39847]: Race condition related to interruption of task threads while they are in RocksDBLoader.loadLibrary()
[SPARK-39867]: Global limit should not inherit OrderPreservingUnaryNode
[SPARK-39887]: Expression transform error
[SPARK-39900]: Issue with querying dataframe produced by ‘binaryFile’ format using ‘not’ operator
[SPARK-39932]: WindowExec should clear the final partition buffer
[SPARK-39952]: SaveIntoDataSourceCommand should recache result relation
[SPARK-39962]: Global aggregation against pandas aggregate UDF does not take the column order into account
[SPARK-39965]: Skip PVC cleanup when driver doesn’t own PVCs
[SPARK-39972]: Revert the test case of SPARK-39962 in branch-3.2 and branch-3.1
[SPARK-40002]: Limit improperly pushed down through window using ntile function
[SPARK-40065]: Executor ConfigMap is not mounted if profile is not default
[SPARK-40079]: Add Imputer inputCols validation for empty input case
[SPARK-40089]: Sorting of at least Decimal(20, 2) fails for some values near the max.
[SPARK-40117]: Convert condition to java in DataFrameWriterV2.overwrite
[SPARK-40121]: Initialize projection used for Python UDF
[SPARK-40124]: Update TPCDS v1.4 q32 for Plan Stability tests
[SPARK-40149]: Star expansion after outer join asymmetrically includes joining key
[SPARK-40169]: Fix the issue with Parquet column index and predicate pushdown in Data source V1
[SPARK-40212]: SparkSQL castPartValue does not properly handle byte & short
[SPARK-40218]: GROUPING SETS should preserve the grouping columns
[SPARK-40270]: Make compute.max_rows as None working in DataFrame.style
[SPARK-40280]: Failure to create parquet predicate push down for ints and longs on some valid files
[SPARK-40315]: Non-deterministic hashCode() calculations for ArrayBasedMapData on equal objects
[SPARK-40407]: Repartition of DataFrame can result in severe data skew in some special case
[SPARK-40459]: recoverDiskStore should not stop by existing recomputed files
[SPARK-40470]: arrays_zip output unexpected alias column names when using GetMapValue and GetArrayStructFields
[SPARK-40493]: Revert “[SPARK-33861][SQL] Simplify conditional in predicate”
[SPARK-40562]: Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
[SPARK-40583]: Documentation error in “Integration with Cloud Infrastructures”
[SPARK-40588]: Sorting issue with partitioned-writing and AQE turned on
[SPARK-40612]: On Kubernetes for long running app Spark using an invalid principal to renew the delegation token
[SPARK-40636]: Fix wrong remained shuffles log in BlockManagerDecommissioner
[SPARK-40660]: Switch to XORShiftRandom to distribute elements
[SPARK-40829]: STORED AS serde in CREATE TABLE LIKE view does not work
[SPARK-40851]: TimestampFormatter behavior changed when using the latest Java 8/11/17
[SPARK-40869]: KubernetesConf.getResourceNamePrefix creates invalid name prefixes
[SPARK-40874]: Fix broadcasts in Python UDFs when encryption is enabled
[SPARK-40902]: Quick submission of drivers in tests to mesos scheduler results in dropping drivers
[SPARK-40963]: ExtractGenerator sets incorrect nullability in new Project
[SPARK-41035]: Incorrect results or NPE when a literal is reused across distinct aggregations
[SPARK-41091]: Fix Docker release tool for branch-3.2
[SPARK-41188]: Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes
[SPARK-38034]: Optimize time complexity and extend applicable cases for TransposeWindow
[SPARK-39831]: R dependencies installation start to fail after devtools_2.4.4 was released
[SPARK-39879]: Reduce local-cluster memory configuration in BroadcastJoinSuite* and HiveSparkSubmitSuite
[SPARK-40022]: YarnClusterSuite should not ABORTED when there is no Python3 environment
[SPARK-40241]: Correct the link of GenericUDTF
[SPARK-40490]: YarnShuffleIntegrationSuite no longer verifies registeredExecFile reload after SPARK-17321
[SPARK-40574]: Add PURGE to DROP TABLE doc
[SPARK-40172]: Temporarily disable flaky test cases in ImageFileFormatSuite
[SPARK-40461]: Set upperbound for pyzmq 24.0.0 for Python linter
[SPARK-40213]: Incorrect ASCII value for Latin-1 Supplement characters
[SPARK-40292]: arrays_zip output unexpected alias column names
[SPARK-40043]: Document DataStreamWriter.toTable and DataStreamReader.table
[SPARK-40983]: Remove Hadoop requirements for zstd mention in Parquet compression codec

Dependency Changes

While being a maintence release we did still upgrade some dependencies in this release they are:

[SPARK-40801]: Upgrade Apache Commons Text to 1.10

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.

Spark News Archive

Latest News

Spark 3.5.5 released (Feb 27, 2025)
Spark 3.5.4 released (Dec 20, 2024)
Spark 3.4.4 released (Oct 27, 2024)
Preview release of Spark 4.0 (Sep 26, 2024)