Spark Release 4.0.1
Apache Spark 4.0.1 is a maintenance release containing important correctness and stability fixes. This release is based on the branch-4.0 maintenance branch of Spark. We strongly recommend all 4.0 users to upgrade to this stable release.
Notable changes
- [SPARK-49872] Allow unlimited json size again
- [SPARK-50137] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception
- [SPARK-50748] Fix a race condition issue which happens when operations are interrupted
- [SPARK-50889] Fix a race condition issue which happens when operations are interrupted
- [SPARK-51430] Stop PySpark context logger from propagating to stdout
- [SPARK-51920] Fix composite/nested structtype in value state for python
- [SPARK-52023] Fix data corruption/segfault returning Option[Product] from udaf
- [SPARK-52146] Detect cyclic function usage in SQL UDFs
- [SPARK-52147] Block temporary object references in persistent SQL UDFs
- [SPARK-52148] Fix CREATE OR REPLACE for SQL TVFs
- [SPARK-52153] Fix from_json and to_json with variant
- [SPARK-52237] Fix the documentation of hypot function
- [SPARK-52240] Corrected row index usage when exploding packed arrays in vectorized reader
- [SPARK-52259] Fix Param class binary compatibility
- [SPARK-52265] Fix regex leading to empty PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite
- [SPARK-52267] Match field id in ParquetToSparkSchemaConverter
- [SPARK-52287] Improve
SparkContext
not to populate o.a.s.internal.io.cloud.*
-related setting if not exist
- [SPARK-52300] Make SQL UDTVF resolution use consistent configurations with view resolution
- [SPARK-52313] Correctly resolve reference data type for Views with default collation
- [SPARK-52316] Upgrade Kafka to 3.9.1
- [SPARK-52324] Move Spark docs to the release directory
- [SPARK-52329] Remove private sql scoping tags for new transformWithState API
- [SPARK-52339] Fix comparison of InMemoryFileIndex instances
- [SPARK-52345] Fix NULL behavior in scripting conditions
- [SPARK-52350] Fix link for SS programming guide page
- [SPARK-52384] Fix bug Connect should insensitive for JDBC options.
- [SPARK-52386] Refactor the
HistoryServerSuite
to support regenerating the expectation.json
files using SPARK_GENERATE_GOLDEN_FILES=1
- [SPARK-52396] Artifact Root Directory should use tmpdir
- [SPARK-52397] Idempotent ExecutePlan: second ExecutePlan with same operationId and plan should reattach
- [SPARK-52398] Change ALTER TABLE ALTER COLUMN TYPE STRING not to apply default collation if original data type was instance of StringType
- [SPARK-52413]
test_install_spark
switch to Spark 3.5.6
- [SPARK-52420] Make test_udtf_with_invalid_return_type compatible with Python only client
- [SPARK-52421] Automatically send the RC vote email
- [SPARK-52450] Improve performance of schema deepcopy
- [SPARK-52453] Automatically release and drop artifacts in Apache Nexus repository
- [SPARK-52454] Automatically remove old releases from the mirror
- [SPARK-52489] Forbid duplicate SQLEXCEPTION and NOT FOUND handlers inside SQL Script
- [SPARK-52497] Add documentation for SQL UDFs
- [SPARK-52499] Add more SQL query tests for different data types
- [SPARK-52516] Don’t hold previous iterator reference after advancing to next file in ParquetPartitionReaderFactory
- [SPARK-52521] Right#replacement should not access SQLConf dynamically
- [SPARK-52529] Fully upgrade jekyll from 4.3 to 4.4
- [SPARK-52531]
OuterReference
in subquery aggregate is incorrectly tied to outer query aggregate
- [SPARK-52542] Use
/nonexistent
instead of nonexistent /opt/spark
- [SPARK-52553] Fix NumberFormatException when reading v1 changelog
- [SPARK-52562] Automatically create the base of release notes and push
- [SPARK-52568] Fix
exec-maven-plugin
version used by dev/test-dependencies.sh
- [SPARK-52584] Make build script to support preview releases in finalize step
- [SPARK-52590] Add SQL query tests on optional return types
- [SPARK-52611] Fix SQLConf version for excludeSubqueryRefsFromRemoveRedundantAliases configuration
- [SPARK-52612] Add an env NO_PROVIDED_SPARK_JARS to control collection behavior of sbt/package for spark-avro.jar and spark-protobuf.jar
- [SPARK-52613] Restore printing full stacktrace when HBase/Hive DelegationTokenProvider hit exception
- [SPARK-52684] Make CACHE TABLE Commands atomic while encounting execution errors
- [SPARK-52691] Upgrade ORC to 2.1.3
- [SPARK-52707] Remove preview postfix when looking up the JIRA versions
- [SPARK-52721] Wrong message parameter for CANNOT_PARSE_DATATYPE
- [SPARK-52735] Fix missing error conditions for SQL UDFs
- [SPARK-52737] Pushdown predicate and number of apps to FsHistoryProvider when listing applications
- [SPARK-52741] RemoveFiles ShuffleCleanup mode doesnt work with non-adaptive execution
- [SPARK-52749] Replace preview1 to dev1 in its PyPI package name
- [SPARK-52753] Make parseDataType binary compatible with previous versions
- [SPARK-52776] Do not split the comm field in ProcfsMetricsGetter
- [SPARK-52786] Make pyspark-client package to upload with preview naming
- [SPARK-52788] Fix error of converting binary value in BinaryType to XML
- [SPARK-52791] Fix error when inferring a UDT with a null first element
- [SPARK-52799] Fix ThriftServerQueryTestSuite result comparison
- [SPARK-52809] Don’t hold reader and iterator references for all partitions in task completion listeners for metric update
- [SPARK-52828] Make hashing for collated strings collation agnostic
- [SPARK-52832] Fix JDBC dialect identifier quoting
- [SPARK-52833] Fix VariantBuilder.appendFloat
- [SPARK-52870] Properly quote variable names in FOR statement
- [SPARK-52873] Further restrict when SHJ semi/anti join can ignore duplicate keys on the build side
- [SPARK-52899] Fix QueryExecutionErrorsSuite test to register H2Dialect back
- [SPARK-52908] Prevent for iterator variable name clashing with names of labels in the path to the root of AST
- [SPARK-52942] YARN External Shuffle Service jar should include scala-library
- [SPARK-52976] Fix Python UDF not accepting collated strings as input param
- [SPARK-52989] Add explicit close API to RocksDB State store iterator and fix current usage
- [SPARK-53020] JPMS args should also apply to non-SparkSubmit process
- [SPARK-53054] Fix the connect.DataFrameReader default format behavior
- [SPARK-53074] Avoid partial clustering in SPJ to meet a child’s required distribution
- [SPARK-53094] Fix CUBE with aggregate containing HAVING clauses
- [SPARK-53120] Recover _source directory for PySpark documentation
- [SPARK-53130] Fix toJson behavior of collated string types
- [SPARK-53155] Global lower agggregation should not be replaced with a project
- [SPARK-53167] Spark launcher isRemote also respects properties files
- [SPARK-53176] Spark launcher should respect
--load-spark-defaults
- [SPARK-53275] Handle stateful expressions when ordering in interpreted mode
- [SPARK-53291] Fix nullability for value column
- [SPARK-53326] Upgrade ORC Format to 1.1.1
- [SPARK-53342] Fix Arrow converter to handle multiple record batches in single IPC stream
- [SPARK-53348] Always persist ANSI value when creating a view or assume it when querying if not stored
- [SPARK-53360] Once strategy with ConstantFolding’s idempotence should not be broken
- [SPARK-53394] UninterruptibleLock.isInterruptible should avoid duplicated interrupt
- [SPARK-53435] Fix race condition in CachedRDDBuilder
You can consult JIRA for the detailed changes.
We would like to acknowledge all community members for contributing patches to this release.
Spark News Archive