Lightning-fast unified analytics engine

Spark Release 3.1.2

Spark 3.1.2 is a maintenance release containing stability fixes. This release is based on the branch-3.1 maintenance branch of Spark. We strongly recommend all 3.1 users to upgrade to this stable release.

Notable changes

  • [SPARK-32924]: Web UI sort on duration is wrong
  • [SPARK-33482]: V2 Datasources that extend FileScan preclude exchange reuse
  • [SPARK-34361]: Dynamic allocation on K8s kills executors with running tasks
  • [SPARK-34392]: Invalid ID for offset-based ZoneId since Spark 3.0
  • [SPARK-34417]: DataFrameNaFunctions.fillMap(values: Seq[(String, Any)]) fails for column name having a dot
  • [SPARK-34436]: DPP support LIKE ANY/ALL
  • [SPARK-34473]: Avoid NPE in DataFrameReader.schema(StructType)
  • [SPARK-34482]: Correct the active SparkSession for streaming query
  • [SPARK-34490]: Table maybe resolved as a view if the table is dropped
  • [SPARK-34497]: JDBC connection provider is not removing kerberos credentials from JVM security context
  • [SPARK-34504]: Avoid unnecessary view resolving and remove the performCheck flag
  • [SPARK-34515]: Fix NPE if InSet contains null value during getPartitionsByFilter
  • [SPARK-34534]: New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss or correctness
  • [SPARK-34543]: Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
  • [SPARK-34545]: Fix inconsistent results when applying 2 Python UDFs with different return type to 2 columns together
  • [SPARK-34547]: Resolve using child metadata attributes as fallback
  • [SPARK-34550]: Skip InSet null value during push filter to Hive metastore
  • [SPARK-34555]: Resolve metadata output from DataFrame
  • [SPARK-34556]: Checking duplicate static partition columns doesn’t respect case sensitive conf
  • [SPARK-34561]: Cannot drop/add columns from/to a dataset of v2 DESCRIBE TABLE
  • [SPARK-34567]: CreateTableAsSelect should have metrics update too
  • [SPARK-34577]: Cannot drop/add columns from/to a dataset of v2 DESCRIBE NAMESPACE
  • [SPARK-34584]: When insert into a partition table with a illegal partition value, DSV2 behavior different as others
  • [SPARK-34596]: NewInstance.doGenCode should not throw malformed class name error
  • [SPARK-34599]: INSERT INTO OVERWRITE doesn’t support partition columns containing dot for DSv2
  • [SPARK-34607]: NewInstance.resolved should not throw malformed class name error
  • [SPARK-34610]: Fix Python UDF used in GroupedAggPandasUDFTests
  • [SPARK-34613]: Fix view does not capture disable hint config
  • [SPARK-34630]: Add type hints of pyspark.version and pyspark.sql.Column.contains
  • [SPARK-34639]: Always remove unnecessary Alias in Analyzer.resolveExpression
  • [SPARK-34674]: Spark app on k8s doesn’t terminate without call to sparkContext.stop() method
  • [SPARK-34681]: Full outer shuffled hash join when building left side produces wrong result
  • [SPARK-34682]: Regression in “operating on canonicalized plan” check in CustomShuffleReaderExec
  • [SPARK-34697]: Allow DESCRIBE FUNCTION and SHOW FUNCTIONS explain about string concatenation operator
  • [SPARK-34713]: Group by CreateStruct with ExtractValue fails analysis
  • [SPARK-34714]: collect_list(struct()) fails when used with GROUP BY
  • [SPARK-34719]: Fail if the view query has duplicated column names
  • [SPARK-34723]: Correct parameter type for subexpression elimination under whole-stage
  • [SPARK-34724]: Fix Interpreted evaluation by using getClass.getMethod instead of getDeclaredMethod
  • [SPARK-34727]: Difference in results of casting float to timestamp
  • [SPARK-34731]: ConcurrentModificationException in EventLoggingListener when redacting properties
  • [SPARK-34737]: Discrepancy between TIMESTAMP_SECONDS and cast from float
  • [SPARK-34756]: Fix FileScan equality check
  • [SPARK-34763]: col(), $”" and df("name") should handle quoted column names properly
  • [SPARK-34766]: Do not capture maven config for views
  • [SPARK-34768]: Respect the default input buffer size in Univocity
  • [SPARK-34770]: InMemoryCatalog.tableExists should not fail if database doesn’t exist
  • [SPARK-34772]: RebaseDateTime loadRebaseRecords should use Spark classloader instead of context
  • [SPARK-34776]: Catalyst error on on certain struct operation (Couldn’t find gen_alias)
  • [SPARK-34790]: Fail in fetch shuffle blocks in batch when i/o encryption is enabled
  • [SPARK-34794]: Nested higher-order functions broken in DSL
  • [SPARK-34796]: Codegen compilation error for query with LIMIT operator and without AQE
  • [SPARK-34803]: Pass the raised ImportError if pandas or pyarrow fail to import
  • [SPARK-34811]: Redact fs.s3a.access.key like secret and token
  • [SPARK-34814]: LikeSimplification should handle NULL
  • [SPARK-34829]: Fix higher order function results
  • [SPARK-34833]: Apply right-padding correctly for correlated subqueries
  • [SPARK-34834]: There is a potential Netty memory leak in TransportResponseHandler
  • [SPARK-34840]: Fix cases of corruption in merged shuffle blocks that are pushed
  • [SPARK-34845]: computeAllMetrics may return partial metrics when some child metrics are missing
  • [SPARK-34876]: Non-nullable aggregates can return NULL in a correlated subquery
  • [SPARK-34897]: Support reconcile schemas based on index after nested column pruning
  • [SPARK-34909]: conv() does not convert negative inputs to unsigned correctly
  • [SPARK-34922]: Use better CBO cost function
  • [SPARK-34923]: Metadata output should not always be propagated
  • [SPARK-34926]: PartitionUtils.getPathFragment should handle null value
  • [SPARK-34939]: Throw fetch failure exception when unable to deserialize broadcasted map statuses
  • [SPARK-34948]: Add ownerReference to executor configmap to fix leakages
  • [SPARK-34949]: Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down
  • [SPARK-34963]: Nested column pruning fails to extract case-insensitive struct field from array
  • [SPARK-34970]: Redact map-type options in the output of explain()
  • [SPARK-35014]: A foldable expression could not be replaced by an AttributeReference
  • [SPARK-35019]: Improve type hints on pyspark.sql.*
  • [SPARK-35045]: Add an internal option to control input buffer in univocity
  • [SPARK-35079]: Transform with udf gives incorrect result
  • [SPARK-35080]: Correlated subqueries with equality predicates can return wrong results
  • [SPARK-35087]: Fix some columns in table Aggregated Metrics by Executor of stage-detail page
  • [SPARK-35093]: AQE columnar mismatch on exchange reuse
  • [SPARK-35096]: foreachBatch throws ArrayIndexOutOfBoundsException if schema is case Insensitive
  • [SPARK-35106]: HadoopMapReduceCommitProtocol performs bad rename when dynamic partition overwrite is used
  • [SPARK-35117]: UI progress bar no longer highlights in progress tasks
  • [SPARK-35127]: When switching between different stage-detail pages, the entry in newly-opened page may be blank
  • [SPARK-35136]: Initial null value of LiveStage.info can lead to NPE
  • [SPARK-35142]: OneVsRest classifier uses incorrect data type for rawPrediction column
  • [SPARK-35168]: mapred.reduce.tasks should be shuffle.partitions not adaptive.coalescePartitions.initialPartitionNum
  • [SPARK-35213]: Corrupt DataFrame for certain withField patterns
  • [SPARK-35226]: JDBC datasources should accept refreshKrb5Config parameter
  • [SPARK-35227]: Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit
  • [SPARK-35244]: invoke should throw the original exception
  • [SPARK-35278]: Invoke should find the method with correct number of parameters
  • [SPARK-35288]: StaticInvoke should find the method without exact argument classes match
  • [SPARK-35359]: Insert data with char/varchar datatype will fail when data length exceed length limitation
  • [SPARK-35381]: Fix lambda variable name issues in nested DataFrame functions in R APIs
  • [SPARK-35382]: Fix lambda variable name issues in nested DataFrame functions in Python APIs
  • [SPARK-35411]: Essential information missing in TreeNode json string
  • [SPARK-35482]: case sensitive block manager port key should be used in BasicExecutorFeatureStep
  • [SPARK-35493]: spark.blockManager.port does not work for driver pod

Dependency Changes

While being a maintence release we did still upgrade some dependencies in this release they are:

  • [SPARK-34752]: Upgrade Jetty to 9.4.37 to fix CVE-2020-27223
  • [SPARK-34988]: Upgrade Jetty to 9.4.39 to fix CVE-2021-28165
  • [SPARK-35210]: Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RESET issue

You can consult JIRA for the detailed changes.

We would like to acknowledge all community members for contributing patches to this release.


Spark News Archive