Spark Release 4.0.1

Apache Spark 4.0.1 is a maintenance release containing important correctness and stability fixes. This release is based on the branch-4.0 maintenance branch of Spark. We strongly recommend all 4.0 users to upgrade to this stable release.

Notable changes

  • [SPARK-49872] Allow unlimited json size again
  • [SPARK-50137] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception
  • [SPARK-50748] Fix a race condition issue which happens when operations are interrupted
  • [SPARK-50889] Fix a race condition issue which happens when operations are interrupted
  • [SPARK-51430] Stop PySpark context logger from propagating to stdout
  • [SPARK-51920] Fix composite/nested structtype in value state for python
  • [SPARK-52023] Fix data corruption/segfault returning Option[Product] from udaf
  • [SPARK-52146] Detect cyclic function usage in SQL UDFs
  • [SPARK-52147] Block temporary object references in persistent SQL UDFs
  • [SPARK-52148] Fix CREATE OR REPLACE for SQL TVFs
  • [SPARK-52153] Fix from_json and to_json with variant
  • [SPARK-52237] Fix the documentation of hypot function
  • [SPARK-52240] Corrected row index usage when exploding packed arrays in vectorized reader
  • [SPARK-52259] Fix Param class binary compatibility
  • [SPARK-52265] Fix regex leading to empty PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite
  • [SPARK-52267] Match field id in ParquetToSparkSchemaConverter
  • [SPARK-52287] Improve SparkContext not to populate o.a.s.internal.io.cloud.*-related setting if not exist
  • [SPARK-52300] Make SQL UDTVF resolution use consistent configurations with view resolution
  • [SPARK-52313] Correctly resolve reference data type for Views with default collation
  • [SPARK-52316] Upgrade Kafka to 3.9.1
  • [SPARK-52324] Move Spark docs to the release directory
  • [SPARK-52329] Remove private sql scoping tags for new transformWithState API
  • [SPARK-52339] Fix comparison of InMemoryFileIndex instances
  • [SPARK-52345] Fix NULL behavior in scripting conditions
  • [SPARK-52350] Fix link for SS programming guide page
  • [SPARK-52384] Fix bug Connect should insensitive for JDBC options.
  • [SPARK-52386] Refactor the HistoryServerSuite to support regenerating the expectation.json files using SPARK_GENERATE_GOLDEN_FILES=1
  • [SPARK-52396] Artifact Root Directory should use tmpdir
  • [SPARK-52397] Idempotent ExecutePlan: second ExecutePlan with same operationId and plan should reattach
  • [SPARK-52398] Change ALTER TABLE ALTER COLUMN TYPE STRING not to apply default collation if original data type was instance of StringType
  • [SPARK-52413] test_install_spark switch to Spark 3.5.6
  • [SPARK-52420] Make test_udtf_with_invalid_return_type compatible with Python only client
  • [SPARK-52421] Automatically send the RC vote email
  • [SPARK-52450] Improve performance of schema deepcopy
  • [SPARK-52453] Automatically release and drop artifacts in Apache Nexus repository
  • [SPARK-52454] Automatically remove old releases from the mirror
  • [SPARK-52489] Forbid duplicate SQLEXCEPTION and NOT FOUND handlers inside SQL Script
  • [SPARK-52497] Add documentation for SQL UDFs
  • [SPARK-52499] Add more SQL query tests for different data types
  • [SPARK-52516] Don’t hold previous iterator reference after advancing to next file in ParquetPartitionReaderFactory
  • [SPARK-52521] Right#replacement should not access SQLConf dynamically
  • [SPARK-52529] Fully upgrade jekyll from 4.3 to 4.4
  • [SPARK-52531] OuterReference in subquery aggregate is incorrectly tied to outer query aggregate
  • [SPARK-52542] Use /nonexistent instead of nonexistent /opt/spark
  • [SPARK-52553] Fix NumberFormatException when reading v1 changelog
  • [SPARK-52562] Automatically create the base of release notes and push
  • [SPARK-52568] Fix exec-maven-plugin version used by dev/test-dependencies.sh
  • [SPARK-52584] Make build script to support preview releases in finalize step
  • [SPARK-52590] Add SQL query tests on optional return types
  • [SPARK-52611] Fix SQLConf version for excludeSubqueryRefsFromRemoveRedundantAliases configuration
  • [SPARK-52612] Add an env NO_PROVIDED_SPARK_JARS to control collection behavior of sbt/package for spark-avro.jar and spark-protobuf.jar
  • [SPARK-52613] Restore printing full stacktrace when HBase/Hive DelegationTokenProvider hit exception
  • [SPARK-52684] Make CACHE TABLE Commands atomic while encounting execution errors
  • [SPARK-52691] Upgrade ORC to 2.1.3
  • [SPARK-52707] Remove preview postfix when looking up the JIRA versions
  • [SPARK-52721] Wrong message parameter for CANNOT_PARSE_DATATYPE
  • [SPARK-52735] Fix missing error conditions for SQL UDFs
  • [SPARK-52737] Pushdown predicate and number of apps to FsHistoryProvider when listing applications
  • [SPARK-52741] RemoveFiles ShuffleCleanup mode doesnt work with non-adaptive execution
  • [SPARK-52749] Replace preview1 to dev1 in its PyPI package name
  • [SPARK-52753] Make parseDataType binary compatible with previous versions
  • [SPARK-52776] Do not split the comm field in ProcfsMetricsGetter
  • [SPARK-52786] Make pyspark-client package to upload with preview naming
  • [SPARK-52788] Fix error of converting binary value in BinaryType to XML
  • [SPARK-52791] Fix error when inferring a UDT with a null first element
  • [SPARK-52799] Fix ThriftServerQueryTestSuite result comparison
  • [SPARK-52809] Don’t hold reader and iterator references for all partitions in task completion listeners for metric update
  • [SPARK-52828] Make hashing for collated strings collation agnostic
  • [SPARK-52832] Fix JDBC dialect identifier quoting
  • [SPARK-52833] Fix VariantBuilder.appendFloat
  • [SPARK-52870] Properly quote variable names in FOR statement
  • [SPARK-52873] Further restrict when SHJ semi/anti join can ignore duplicate keys on the build side
  • [SPARK-52899] Fix QueryExecutionErrorsSuite test to register H2Dialect back
  • [SPARK-52908] Prevent for iterator variable name clashing with names of labels in the path to the root of AST
  • [SPARK-52942] YARN External Shuffle Service jar should include scala-library
  • [SPARK-52976] Fix Python UDF not accepting collated strings as input param
  • [SPARK-52989] Add explicit close API to RocksDB State store iterator and fix current usage
  • [SPARK-53020] JPMS args should also apply to non-SparkSubmit process
  • [SPARK-53054] Fix the connect.DataFrameReader default format behavior
  • [SPARK-53074] Avoid partial clustering in SPJ to meet a child’s required distribution
  • [SPARK-53094] Fix CUBE with aggregate containing HAVING clauses
  • [SPARK-53120] Recover _source directory for PySpark documentation
  • [SPARK-53130] Fix toJson behavior of collated string types
  • [SPARK-53155] Global lower agggregation should not be replaced with a project
  • [SPARK-53167] Spark launcher isRemote also respects properties files
  • [SPARK-53176] Spark launcher should respect --load-spark-defaults
  • [SPARK-53275] Handle stateful expressions when ordering in interpreted mode
  • [SPARK-53291] Fix nullability for value column
  • [SPARK-53326] Upgrade ORC Format to 1.1.1
  • [SPARK-53342] Fix Arrow converter to handle multiple record batches in single IPC stream
  • [SPARK-53348] Always persist ANSI value when creating a view or assume it when querying if not stored
  • [SPARK-53360] Once strategy with ConstantFolding’s idempotence should not be broken
  • [SPARK-53394] UninterruptibleLock.isInterruptible should avoid duplicated interrupt
  • [SPARK-53435] Fix race condition in CachedRDDBuilder

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive