Spark Release 3.4.3
Spark 3.4.3 is a maintenance release containing security and correctness fixes. This release is based on the branch-3.4 maintenance branch of Spark. We strongly recommend all 3.4 users to upgrade to this stable release.
Notable changes
- [SPARK-45580]: Handle case where a nested subquery becomes an existence join
- [SPARK-46029]: Escape the single quote, _ and % for DS V2 pushdown
- [SPARK-46092]: Don’t push down Parquet row group filters that overflow
- [SPARK-46182]: Track
lastTaskFinishTime
using the exact task finished event
- [SPARK-46189]: Perform comparisons and arithmetic between same types in various Pandas aggregate functions to avoid interpreted mode errors
- [SPARK-46239]: Hide
Jetty
info
- [SPARK-46275]: Protobuf: Return null in permissive mode when deserialization fails
- [SPARK-46286]: Document
spark.io.compression.zstd.bufferPool.enabled
- [SPARK-46330]: Loading of Spark UI blocks for a long time when HybridStore enabled
- [SPARK-46339]: Directory with batch number name should not be treated as metadata log
- [SPARK-46369]: Remove
kill
link from RELAUNCHING
drivers in MasterPage
- [SPARK-46400]: When there are corrupted files in the local maven repo, skip this cache and try again
- [SPARK-46417]: Do not fail when calling hive.getTable and throwException is false
- [SPARK-46466]: Vectorized parquet reader should never do rebase for timestamp ntz
- [SPARK-46598]: OrcColumnarBatchReader should respect the memory mode when creating column vectors for the missing column
- [SPARK-46628]: Use SPDX short identifier in
license
name
- [SPARK-46700]: Count the last spilling for the shuffle disk spilling bytes metric
- [SPARK-46704]: Fix
MasterPage
to sort Running Drivers
table by Duration
column correctly
- [SPARK-46747]: Avoid scan in getTableExistsQuery for JDBC Dialects
- [SPARK-46763]: Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes
- [SPARK-46779]:
InMemoryRelation
instances of the same cached plan should be semantically equivalent
- [SPARK-46786]: Fix
MountVolumesFeatureStep
to use ReadWriteOncePod
instead of ReadWriteOnce
- [SPARK-46794]: Remove subqueries from LogicalRDD constraints
- [SPARK-46801]: Do not treat exit code 5 as a test failure in Python testing script
- [SPARK-46817]: Fix
spark-daemon.sh
usage by adding decommission
command
- [SPARK-46861]: Avoid Deadlock in DAGScheduler
- [SPARK-46862]: Disable CSV column pruning in the multi-line mode
- [SPARK-46888]: Fix
Master
to reject /workers/kill/
requests if decommission is disabled
- [SPARK-46893]: Remove inline scripts from UI descriptions
- [SPARK-46945]: Add
spark.kubernetes.legacy.useReadWriteOnceAccessMode
for old K8s clusters
- [SPARK-47063]: CAST long to timestamp has different behavior for codegen vs interpreted
- [SPARK-47072]: Fix supported interval formats in error messages
- [SPARK-47085]: Reduce the complexity of toTRowSet from n^2 to n
- [SPARK-47125]: Return null if Univocity never triggers parsing
- [SPARK-47146]: Possible thread leak when doing sort merge join
- [SPARK-47177]: Cached SQL plan do not display final AQE plan in explain string
- [SPARK-47187]: Fix hive compress output config does not work
- [SPARK-47236]: Fix
deleteRecursivelyUsingJavaIO
to skip non-existing file input
- [SPARK-47305]: Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
- [SPARK-47318]: Adds HKDF round to AuthEngine key derivation to follow standard KEX practices
- [SPARK-47368]: Remove inferTimestampNTZ config check in ParquetRowConverter
- [SPARK-47370]: Add migration doc for TimestampNTZ type inference on Parquet files
- [SPARK-47385]: Fix tuple encoders with Option inputs
- [SPARK-47434]: Fix
statistics
link in StreamingQueryPage
- [SPARK-47494]: Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3
- [SPARK-47503]: Make makeDotNode escape graph node name always
- [SPARK-47521]: Use
Utils.tryWithResource
during reading shuffle data from external storage
- [SPARK-47537]: Fix error data type mapping on MySQL Connector/J
- [SPARK-47646]: Make try_to_number return NULL for malformed input
- [SPARK-47666]: Fix NPE when reading mysql bit array as LongType
- [SPARK-47824]: Fix nondeterminism in pyspark.pandas.series.asof
Dependency Changes
While being a maintenance release we did still upgrade some dependencies in this release they are:
You can consult JIRA for the detailed changes.
We would like to acknowledge all community members for contributing patches to this release.
Spark News Archive