Lightning-fast unified analytics engine

Spark Release 3.0.3

Spark 3.0.3 is a maintenance release containing stability fixes. This release is based on the branch-3.0 maintenance branch of Spark. We strongly recommend all 3.0 users to upgrade to this stable release.

Notable changes

  • [SPARK-34421]: Custom functions can’t be used in temporary views with CTEs
  • [SPARK-34545]: PySpark Python UDF return inconsistent results when applying 2 UDFs with different return type to 2 columns together
  • [SPARK-34719]: Fail if the view query has duplicated column names
  • [SPARK-35463]: Skip checking checksum on a system doesn’t have shasum
  • [SPARK-32924]: Web UI sort on duration is wrong
  • [SPARK-33482]: V2 Datasources that extend FileScan preclude exchange reuse
  • [SPARK-33504]: The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text
  • [SPARK-34424]: HiveOrcHadoopFsRelationSuite fails with seed 610710213676
  • [SPARK-34556]: Checking duplicate static partition columns doesn’t respect case sensitive conf
  • [SPARK-34596]: NewInstance.doGenCode should not throw malformed class name error
  • [SPARK-34763]: col(), $”" and df("name") should handle quoted column names properly
  • [SPARK-34794]: Nested higher-order functions broken in DSL
  • [SPARK-34798]: Fix incorrect join condition
  • [SPARK-34876]: Non-nullable aggregates can return NULL in a correlated subquery
  • [SPARK-34897]: Support reconcile schemas based on index after nested column pruning
  • [SPARK-34909]: conv() does not convert negative inputs to unsigned correctly
  • [SPARK-34922]: Use better CBO cost function
  • [SPARK-34963]: Nested column pruning fails to extract case-insensitive struct field from array
  • [SPARK-34970]: Redact map-type options in the output of explain()
  • [SPARK-35080]: Correlated subqueries with equality predicates can return wrong results
  • [SPARK-35096]: foreachBatch throws ArrayIndexOutOfBoundsException if schema is case Insensitive
  • [SPARK-35106]: HadoopMapReduceCommitProtocol performs bad rename when dynamic partition overwrite is used
  • [SPARK-35227]: Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit
  • [SPARK-35296]: Dataset.observe fails with an assertion
  • [SPARK-35482]: case sensitive block manager port key should be used in BasicExecutorFeatureStep
  • [SPARK-35493]: spark.blockManager.port does not work for driver pod
  • [SPARK-35659]: Avoid write null to StateStore
  • [SPARK-35673]: Spark fails on unrecognized hint in subquery
  • [SPARK-35679]: Overflow on converting valid Timestamp to Microseconds
  • [SPARK-34697]: Allow DESCRIBE FUNCTION and SHOW FUNCTIONS explain about   (string concatenation operator)
  • [SPARK-34772]: RebaseDateTime loadRebaseRecords should use Spark classloader instead of context
  • [SPARK-35127]: When we switch between different stage-detail pages, the entry item in the newly-opened page may be blank
  • [SPARK-35168]: mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
  • [SPARK-35566]: Fix number of output rows for StateStoreRestoreExec
  • [SPARK-35714]: Bug fix for deadlock during the executor shutdown
  • [SPARK-34534]: New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss or correctness
  • [SPARK-34939]: Throw fetch failure exception when unable to deserialize broadcasted map statuses

Dependency Changes

While being a maintence release we did still upgrade some dependencies in this release they are:

  • [SPARK-35210]: Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

Known issues

  • [SPARK-34529]: spark.read.csv is throwing exception ,”lineSep’ can contain only 1 character” when parsing windows line feed (CR LF)

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive