Spark Release 3.4.2

Spark 3.4.2 is a maintenance release containing security and correctness fixes. This release is based on the branch-3.4 maintenance branch of Spark. We strongly recommend all 3.4 users to upgrade to this stable release.

Notable changes

  • [SPARK-42784]: should still create subDir when the number of subDir in merge dir is less than conf
  • [SPARK-43203]: Fix DROP table behavior in session catalog
  • [SPARK-43393]: Address sequence expression overflow bug
  • [SPARK-44040]: Fix compute stats when AggregateExec node above QueryStageExec
  • [SPARK-44079]: Fix ArrayIndexOutOfBoundsException when parse array as struct using PERMISSIVE mode with corrupt record
  • [SPARK-44134]: Fix setting resources (GPU/FPGA) to 0 when they are set in spark-defaults.conf
  • [SPARK-44136]: Fixed an issue that StateManager may get materialized in executor instead of driver in FlatMapGroupsWithStateExec
  • [SPARK-44142]: Replace type with tpe in utility to convert python types to spark types
  • [SPARK-44180]: DistributionAndOrderingUtils should apply ResolveTimeZone
  • [SPARK-44206]: DataSet.selectExpr scope
  • [SPARK-44215]: If num chunks are 0, then server should throw a RuntimeException
  • [SPARK-44241]: Mistakenly set io.connectionTimeout/connectionCreationTimeout to zero or negative will cause incessant executor cons/destructions
  • [SPARK-44251]: Set nullable correctly on coalesced join key in full outer USING join
  • [SPARK-44313]: Fix generated column expression validation when there is a char/varchar column in the schema
  • [SPARK-44391]: Check the number of argument types in InvokeLike
  • [SPARK-44464]: Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
  • [SPARK-44479]: Fix protobuf conversion from an empty struct type
  • [SPARK-44547]: Ignore fallback storage for cached RDD migration
  • [SPARK-44581]: Fix the bug that ShutdownHookManager gets wrong UGI from SecurityManager of ApplicationMaster
  • [SPARK-44588]: Fix double encryption issue for migrated shuffle blocks
  • [SPARK-44630]: Revert “[SPARK-43043] Improve the performance of MapOutputTracker.updateMapOutput”
  • [SPARK-44634]: Encoders.bean does no longer support nested beans with type arguments
  • [SPARK-44641]: Incorrect result in certain scenarios when SPJ is not triggered
  • [SPARK-44653]: Non-trivial DataFrame unions should not break caching
  • [SPARK-44657]: Fix incorrect limit handling in ArrowBatchWithSchemaIterator and config parsing of CONNECT_GRPC_ARROW_MAX_BATCH_SIZE
  • [SPARK-44805]: getBytes/getShorts/getInts/etc. should work in a column vector that has a dictionary
  • [SPARK-44840]: Make array_insert() 1-based for negative indexes
  • [SPARK-44846]: Convert the lower redundant Aggregate to Project in RemoveRedundantAggregates
  • [SPARK-44854]: Python timedelta to DayTimeIntervalType edge case bug
  • [SPARK-44857]: Fix getBaseURI error in Spark Worker LogPage UI buttons
  • [SPARK-44859]: Fix incorrect property name in structured streaming doc
  • [SPARK-44871]: Fix percentile_disc behaviour
  • [SPARK-44910]: Encoders.bean does not support superclasses with generic type arguments
  • [SPARK-44920]: Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient()
  • [SPARK-44925]: K8s default service token file should not be materialized into token
  • [SPARK-44935]: Fix RELEASE file to have the correct information in Docker images if exists
  • [SPARK-44937]: Mark connection as timedOut in TransportClient.close
  • [SPARK-44940]: Improve performance of JSON parsing when “spark.sql.json.enablePartialResults” is enabled
  • [SPARK-44973]: Fix ArrayIndexOutOfBoundsException in conv()
  • [SPARK-44990]: Reduce the frequency of get spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv
  • [SPARK-45054]: HiveExternalCatalog.listPartitions should restore partition statistics
  • [SPARK-45057]: Avoid acquire read lock when keepReadLock is false
  • [SPARK-45071]: Optimize the processing speed of BinaryArithmetic#dataType when processing multi-column data
  • [SPARK-45075]: Fix alter table with invalid default value will not report error
  • [SPARK-45078]: Fix array_insert ImplicitCastInputTypes not work
  • [SPARK-45079]: Fix an internal error from percentile_approx()on NULL accuracy
  • [SPARK-45081]: Encoders.bean does no longer work with read-only properties
  • [SPARK-45100]: Fix an internal error from reflect()on NULL class and method
  • [SPARK-45109]: Fix log function in Connect
  • [SPARK-45187]: Fix WorkerPage to use the same pattern for logPage urls
  • [SPARK-45227]: Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend
  • [SPARK-45282]: Correctness issue in AQE with InMemoryTableScanExec
  • [SPARK-45389]: Correct MetaException matching rule on getting partition metadata
  • [SPARK-45430]: Fix for FramelessOffsetWindowFunction when IGNORE NULLS and offset > rowCount
  • [SPARK-45433]: Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat
  • [SPARK-45473]: Fix incorrect error message for RoundBase
  • [SPARK-45508]: Add “–add-opens=java.base/jdk.internal.ref=ALL-UNNAMED” so Platform can access Cleaner on Java 9+
  • [SPARK-45592]: Correctness issue in AQE with InMemoryTableScanExec
  • [SPARK-45604]: Add LogicalType checking on INT64 -> DateTime conversion on Parquet Vectorized Reader
  • [SPARK-45652]: SPJ: Handle empty input partitions after dynamic filtering
  • [SPARK-45670]: SparkSubmit does not support --total-executor-cores when deploying on K8s
  • [SPARK-45678]: Cover BufferReleasingInputStream.available/reset under tryOrFetchFailedException
  • [SPARK-45749]: Fix Spark History Server to sort Duration column properly
  • [SPARK-45786]: Fix inaccurate Decimal multiplication and division results
  • [SPARK-45814]: Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak
  • [SPARK-45882]: BroadcastHashJoinExec propagate partitioning should respect CoalescedHashPartitioning
  • [SPARK-45896]: Construct ValidateExternalType with the correct expected type
  • [SPARK-45920]: group by ordinal should be idempotent
  • [SPARK-46006]: YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
  • [SPARK-46012]: EventLogFileReader should not read rolling logs if app status file is missing
  • [SPARK-46062]: Sync the isStreaming flag between CTE definition and reference
  • [SPARK-46064]: Move out EliminateEventTimeWatermark to the analyzer and change to only take effect on resolved child

Dependency Changes

While being a maintenance release we did still upgrade some dependencies in this release they are:

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.

Spark News Archive