Lightning-fast unified analytics engine

Spark Release 2.4.8

Spark 2.4.8 is a maintenance release containing stability, correctness, and security fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable release.

Notable changes

  • [SPARK-21492]: Fix memory leak in SortMergeJoin
  • [SPARK-25271]: Creating parquet table with all the column null throws exception
  • [SPARK-26625]: spark.redaction.regex should include oauthToken
  • [SPARK-26645]: CSV infer schema bug infers decimal(9,-1)
  • [SPARK-27575]: Spark overwrites existing value of spark.yarn.dist.* instead of merging value
  • [SPARK-27872]: Driver and executors use a different service account breaking pull secrets
  • [SPARK-29574]: spark with user provided hadoop doesn’t work on kubernetes
  • [SPARK-30201]: HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
  • [SPARK-32635]: When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
  • [SPARK-32708]: Query optimization fails to reuse exchange with DataSourceV2
  • [SPARK-32715]: Broadcast block pieces may memory leak
  • [SPARK-32738]: thread safe endpoints may hang due to fatal error
  • [SPARK-32794]: Rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 streaming sources
  • [SPARK-32815]: Fix LibSVM data source loading error on file paths with glob metacharacters
  • [SPARK-32836]: Fix DataStreamReaderWriterSuite to check writer options correctly
  • [SPARK-32872]: BytesToBytesMap at MAX_CAPACITY exceeds growth threshold
  • [SPARK-32900]: UnsafeExternalSorter.SpillableIterator cannot spill when there are NULLs in the input and radix sorting is used.
  • [SPARK-32901]: UnsafeExternalSorter may cause a SparkOutOfMemoryError to be thrown while spilling
  • [SPARK-32908]: percentile_approx() returns incorrect results
  • [SPARK-32999]: TreeNode.nodeName should not throw malformed class name error
  • [SPARK-33094]: ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
  • [SPARK-33101]: LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
  • [SPARK-33131]: Fix grouping sets with having clause can not resolve qualified col name
  • [SPARK-33136]: Handling nullability for complex types is broken during resolution of V2 write command
  • [SPARK-33183]: Bug in optimizer rule EliminateSorts
  • [SPARK-33230]: FileOutputWriter jobs have duplicate JobIDs if launched in same second
  • [SPARK-33268]: Fix bugs for casting data from/to PythonUserDefinedType
  • [SPARK-33277]: Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
  • [SPARK-33292]: Make Literal ArrayBasedMapData string representation disambiguous
  • [SPARK-33338]: GROUP BY using literal map should not fail
  • [SPARK-33339]: Pyspark application will hang due to non Exception
  • [SPARK-33372]: Fix InSet bucket pruning
  • [SPARK-33472]: IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements
  • [SPARK-33588]: Partition spec in SHOW TABLE EXTENDED doesn’t respect spark.sql.caseSensitive
  • [SPARK-33593]: Vector reader got incorrect data with binary partition value
  • [SPARK-33726]: Duplicate field names causes wrong answers during aggregation
  • [SPARK-33733]: PullOutNondeterministic should check and collect deterministic field
  • [SPARK-33756]: BytesToBytesMap’s iterator hasNext method should be idempotent.
  • [SPARK-34125]: Make EventLoggingListener.codecMap thread-safe
  • [SPARK-34187]: Use available offset range obtained during polling when checking offset validation
  • [SPARK-34212]: For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
  • [SPARK-34229]: Avro should read decimal values with the file schema
  • [SPARK-34260]: UnresolvedException when creating temp view twice
  • [SPARK-34273]: Do not reregister BlockManager when SparkContext is stopped
  • [SPARK-34318]: Dataset.colRegex should work with column names and qualifiers which contain newlines
  • [SPARK-34327]: Omit inlining passwords during build process.
  • [SPARK-34596]: NewInstance.doGenCode should not throw malformed class name error
  • [SPARK-34607]: NewInstance.resolved should not throw malformed class name error
  • [SPARK-34724]: Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
  • [SPARK-34726]: Fix collectToPython timeouts
  • [SPARK-34776]: Catalyst error on on certain struct operation (Couldn’t find gen_alias)
  • [SPARK-34811]: Redact fs.s3a.access.key like secret and token
  • [SPARK-34855]: SparkContext - avoid using local lazy val
  • [SPARK-34876]: Non-nullable aggregates can return NULL in a correlated subquery
  • [SPARK-34909]: conv() does not convert negative inputs to unsigned correctly
  • [SPARK-34939]: Throw fetch failure exception when unable to deserialize broadcasted map statuses
  • [SPARK-34963]: Nested column pruning fails to extract case-insensitive struct field from array
  • [SPARK-35080]: Correlated subqueries with equality predicates can return wrong results
  • [SPARK-35278]: Invoke should find the method with correct number of parameters
  • [SPARK-35288]: StaticInvoke should find the method without exact argument classes match

Dependency Changes

Known issues

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive