Spark Release 3.4.1

Spark 3.4.1 is a maintenance release containing stability fixes. This release is based on the branch-3.4 maintenance branch of Spark. We strongly recommend all 3.4 users to upgrade to this stable release.

Notable changes

  • [SPARK-44383]: Fix the trim logic did’t handle ASCII control characters correctly
  • [SPARK-37829]: Dataframe.joinWith outer-join should return a null value for unmatched row
  • [SPARK-42078]: Add CapturedException to utils
  • [SPARK-42290]: Fix the OOM error can’t be reported when AQE on
  • [SPARK-42421]: Use the utils to get the switch for dynamic allocation used in local checkpoint
  • [SPARK-42475]: Fix PySpark connect Quickstart binder link
  • [SPARK-42826]: Update migration notes for pandas API on Spark
  • [SPARK-43043]: Improve the performance of MapOutputTracker.updateMapOutput
  • [SPARK-43050]: Fix construct aggregate expressions by replacing grouping functions
  • [SPARK-43067]: Correct the location of error class resource file in Kafka connector
  • [SPARK-43069]: Use sbt-eclipse instead of sbteclipse-plugin
  • [SPARK-43071]: Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation
  • [SPARK-43072]: Include TIMESTAMP_NTZ type in ANSI Compliance doc
  • [SPARK-43075]: Change gRPC to grpcio when it is not installed.
  • [SPARK-43083]: Mark *StateStoreSuite as ExtendedSQLTest
  • [SPARK-43085]: Support column DEFAULT assignment for multi-part table names
  • [SPARK-43098]: Fix correctness COUNT bug when scalar subquery has group by clause
  • [SPARK-43113]: Evaluate stream-side variables when generating code for a bound condition
  • [SPARK-43125]: Fix Connect Server Can’t Handle Exception With Null Message
  • [SPARK-43126]: Mark two Hive UDF expressions as stateful
  • [SPARK-43139]: Fix incorrect column names in sql-ref-syntax-dml-insert-table.md
  • [SPARK-43141]: Ignore generated Java files in checkstyle
  • [SPARK-43156]: Fix COUNT(*) is null bug in correlated scalar subquery
  • [SPARK-43157]: Clone InMemoryRelation cached plan to prevent cloned plan from referencing same objects
  • [SPARK-43158]: Set upperbound of pandas version for Binder integration
  • [SPARK-43249]: Fix missing stats for SQL Command
  • [SPARK-43281]: Fix concurrent writer does not update file metrics
  • [SPARK-43284]: Switch back to url-encoded strings
  • [SPARK-43293]: __qualified_access_only should be ignored in normal columns
  • [SPARK-43313]: Adding missing column DEFAULT values for MERGE INSERT actions
  • [SPARK-43336]: Casting between Timestamp and TimestampNTZ requires timezone
  • [SPARK-43337]: Asc/desc arrow icons for sorting column does not get displayed in the table column
  • [SPARK-43340]: Handle missing stack-trace field in eventlogs
  • [SPARK-43342]: Revert SPARK-39006 Show a directional error message for executor PVC dynamic allocation failure
  • [SPARK-43374]: Move protobuf-java to BSD 3-clause group and update the license copy
  • [SPARK-43378]: Properly close stream objects in deserializeFromChunkedBuffer
  • [SPARK-43395]: Exclude macOS tar extended metadata in make-distribution.sh
  • [SPARK-43398]: Executor timeout should be max of idle shuffle and rdd timeout
  • [SPARK-43404]: Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error
  • [SPARK-43414]: Fix flakiness in Kafka RDD suites due to port binding configuration issue
  • [SPARK-43425]: Add TimestampNTZType to ColumnarBatchRow
  • [SPARK-43441]: makeDotNode should not fail when DeterministicLevel is absent
  • [SPARK-43450]: Add more _metadata filter test cases
  • [SPARK-43471]: Handle missing hadoopProperties and metricsProperties
  • [SPARK-43483]: Adds SQL references for OFFSET clause
  • [SPARK-43510]: Fix YarnAllocator internal state when adding running executor after processing completed containers
  • [SPARK-43517]: Add a migration guide for namedtuple monkey patch
  • [SPARK-43522]: Fix creating struct column name with index of array
  • [SPARK-43527]: Fix catalog.listCatalogs in PySpark
  • [SPARK-43541]: Propagate all Project tags in resolving of expressions and missing columns
  • [SPARK-43547]: Update “Supported Pandas API” page to point out the proper pandas docs
  • [SPARK-43587]: Run HealthTrackerIntegrationSuite in a dedicated JVM
  • [SPARK-43589]: Fix cannotBroadcastTableOverMaxTableBytesError to use bytesToString
  • [SPARK-43718]: Set nullable correctly for keys in USING joins
  • [SPARK-43719]: Handle missing row.excludedInStages field
  • [SPARK-43751]: Document unbase64 behavior change
  • [SPARK-43758]: Update Hadoop 2 dependency manifest
  • [SPARK-43759]: Expose TimestampNTZType in pyspark.sql.types
  • [SPARK-43760]: Nullability of scalar subquery results
  • [SPARK-43802]: Fix codegen for unhex and unbase64 with failOnError=true
  • [SPARK-43894]: Fix bug in df.cache()
  • [SPARK-43956]: Fix the bug doesn’t display column’s sql for Percentile[Cont Disc]
  • [SPARK-43973]: Structured Streaming UI should display failed queries correctly
  • [SPARK-43976]: Handle the case where modifiedConfigs doesn’t exist in event logs
  • [SPARK-44018]: Improve the hashCode and toString for some DS V2 Expression
  • [SPARK-44038]: Update YuniKorn docs with v1.3
  • [SPARK-44040]: Fix compute stats when AggregateExec node above QueryStageExec

Dependency Changes

While being a maintenance release we did still upgrade some dependencies in this release they are:

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive