Spark Release 3.0.2

Spark 3.0.2 is a maintenance release containing stability fixes. This release is based on the branch-3.0 maintenance branch of Spark. We strongly recommend all 3.0 users to upgrade to this stable release.

Notable changes

  • [SPARK-31511]: Make BytesToBytesMap iterator() thread-safe
  • [SPARK-32635]: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
  • [SPARK-32753]: Deduplicating and repartitioning the same column create duplicate rows with AQE
  • [SPARK-32764]: compare of -0.0 < 0.0 return true
  • [SPARK-32840]: Invalid interval value can happen to be just adhesive with the unit
  • [SPARK-32908]: percentile_approx() returns incorrect results
  • [SPARK-33019]: Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
  • [SPARK-33183]: Bug in optimizer rule EliminateSorts
  • [SPARK-33260]: SortExec produces incorrect results if sortOrder is a Stream
  • [SPARK-33290]: SPARK-33507 REFRESH TABLE should invalidate cache even though the table itself may not be cached
  • [SPARK-33358]: Spark SQL CLI command processing loop can’t exit while one comand fail
  • [SPARK-33404]: “date_trunc” expression returns incorrect results
  • [SPARK-33435]: SPARK-33507 DSv2: REFRESH TABLE should invalidate caches
  • [SPARK-33591]: NULL is recognized as the “null” string in partition specs
  • [SPARK-33593]: Vector reader got incorrect data with binary partition value
  • [SPARK-33726]: Duplicate field names causes wrong answers during aggregation
  • [SPARK-33819]: SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be package private
  • [SPARK-33950]: ALTER TABLE .. DROP PARTITION doesn’t refresh cache
  • [SPARK-34011]: ALTER TABLE .. RENAME TO PARTITION doesn’t refresh cache
  • [SPARK-34027]: ALTER TABLE .. RECOVER PARTITIONS doesn’t refresh cache
  • [SPARK-34055]: ALTER TABLE .. ADD PARTITION doesn’t refresh cache
  • [SPARK-34187]: Use available offset range obtained during polling when checking offset validation
  • [SPARK-34212]: For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
  • [SPARK-34213]: LOAD DATA doesn’t refresh v1 table cache
  • [SPARK-34229]: Avro should read decimal values with the file schema
  • [SPARK-34262]: ALTER TABLE .. SET LOCATION doesn’t refresh v1 table cache

Dependency Changes

While being a maintence release we did still upgrade some dependencies in this release they are:

Known issues

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.

Spark News Archive