Apache Spark 4.1.0 is the second release in the 4.x series. With significant contributions from the open-source community, this release addressed over 1,800 Jira tickets with contributions from more than 230 individuals.
This release continues the Spark 4.x momentum and focuses on higher-level data engineering, lower-latency streaming, faster and easier PySpark, and a more capable SQL surface.
This release adds Spark Declarative Pipelines (SDP): A new declarative framework where you define datasets and queries, and Spark handles the execution graph, dependency ordering, parallelism, checkpoints, and retries.
This release supports Structured Streaming Real-Time Mode (RTM): First official support for Structured Streaming queries running in real-time mode for continuous, sub-second latency processing. For stateless tasks, latency can even drop to single-digit milliseconds.
PySpark UDFs and Data Sources have been improved: New Arrow-native UDF and UDTF decorators for efficient PyArrow execution without Pandas conversion overhead, plus Python Data Source filter pushdown to reduce data movement.
Spark ML on Connect is GA for the Python client, with smarter model caching and memory management. Spark 4.1 also improves stability for large workloads with zstd-compressed protobuf plans, chunked Arrow result streaming, and enhanced support for large local relations.
SQL Scripting is GA and enabled by default, with improved error handling and cleaner declarations. VARIANT is GA with shredding for faster reads on semi-structured data, plus recursive CTE support and new approximate data sketches (KLL and Theta).
To download Apache Spark 4.1.0, please visit the downloads page. For detailed changes, you can consult JIRA. We have also curated a list of high-level changes here, grouped by major components.
| Library Name | Version Change |
|---|---|
| analyticsaccelerator-s3 | -> 1.3.0 (NEW) |
| annotations | 17.0.0 -> REMOVED |
| arpack | 3.0.3 -> 3.0.4 |
| arrow-compression | -> 18.3.0 (NEW) |
| arrow-format | 18.1.0 -> 18.3.0 |
| arrow-memory-core | 18.1.0 -> 18.3.0 |
| arrow-memory-netty | 18.1.0 -> 18.3.0 |
| arrow-memory-netty-buffer-patch | 18.1.0 -> 18.3.0 |
| arrow-vector | 18.1.0 -> 18.3.0 |
| avro | 1.12.0 -> 1.12.1 |
| avro-ipc | 1.12.0 -> 1.12.1 |
| avro-mapred | 1.12.0 -> 1.12.1 |
| bcprov-jdk18on | 1.80 -> REMOVED |
| blas | 3.0.3 -> 3.0.4 |
| bundle | 2.25.53 -> 2.29.52 |
| checker-qual | 3.43.0 -> REMOVED |
| commons-cli | 1.9.0 -> 1.10.0 |
| commons-codec | 1.17.2 -> 1.19.0 |
| commons-collections | 3.2.2 -> REMOVED |
| commons-collections4 | 4.4 -> 4.5.0 |
| commons-compress | 1.27.1 -> 1.28.0 |
| commons-io | 2.18.0 -> 2.21.0 |
| commons-lang3 | 3.17.0 -> 3.19.0 |
| commons-text | 1.13.0 -> 1.14.0 |
| curator-client | 5.7.1 -> 5.9.0 |
| curator-framework | 5.7.1 -> 5.9.0 |
| curator-recipes | 5.7.1 -> 5.9.0 |
| datasketches-java | 6.1.1 -> 6.2.0 |
| error_prone_annotations | 2.36.0 -> REMOVED |
| failureaccess | 1.0.2 -> 1.0.3 |
| flatbuffers-java | 24.3.25 -> 25.2.10 |
| gcs-connector | hadoop3-2.2.26 -> hadoop3-2.2.28 |
| guava | 33.4.0-jre -> 33.4.8-jre |
| hadoop-aliyun | 3.4.1 -> 3.4.2 |
| hadoop-annotations | 3.4.1 -> 3.4.2 |
| hadoop-aws | 3.4.1 -> 3.4.2 |
| hadoop-azure | 3.4.1 -> 3.4.2 |
| hadoop-azure-datalake | 3.4.1 -> 3.4.2 |
| hadoop-client-api | 3.4.1 -> 3.4.2 |
| hadoop-client-runtime | 3.4.1 -> 3.4.2 |
| hadoop-cloud-storage | 3.4.1 -> 3.4.2 |
| hadoop-huaweicloud | 3.4.1 -> 3.4.2 |
| hadoop-shaded-guava | 1.3.0 -> 1.4.0 |
| icu4j | 76.1 -> 77.1 |
| j2objc-annotations | 3.0.0 -> REMOVED |
| jackson-annotations | 2.18.2 -> 2.20 |
| jackson-core | 2.18.2 -> 2.20.0 |
| jackson-core-asl | 1.9.13 -> REMOVED |
| jackson-databind | 2.18.2 -> 2.20.0 |
| jackson-dataformat-cbor | 2.18.2 -> 2.20.0 |
| jackson-dataformat-yaml | 2.18.2 -> 2.20.0 |
| jackson-datatype-jsr310 | 2.18.2 -> 2.20.0 |
| jackson-mapper-asl | 1.9.13 -> REMOVED |
| jackson-module-scala | 2.18.2 -> 2.20.0 |
| java-diff-utils | 4.15 -> 4.16 |
| jcl-over-slf4j | 2.0.16 -> 2.0.17 |
| jetty-util | 11.0.24 -> 11.0.26 |
| jetty-util-ajax | 11.0.24 -> 11.0.26 |
| jline | 3.27.1 -> 3.29.0 |
| joda-time | 2.13.0 -> 2.14.0 |
| jodd-core | 3.5.2 -> REMOVED |
| jts-core | -> 1.20.0 (NEW) |
| jul-to-slf4j | 2.0.16 -> 2.0.17 |
| kubernetes-client | 7.1.0 -> 7.4.0 |
| kubernetes-client-api | 7.1.0 -> 7.4.0 |
| kubernetes-httpclient-vertx | 7.1.0 -> 7.4.0 |
| kubernetes-model-admissionregistration | 7.1.0 -> 7.4.0 |
| kubernetes-model-apiextensions | 7.1.0 -> 7.4.0 |
| kubernetes-model-apps | 7.1.0 -> 7.4.0 |
| kubernetes-model-autoscaling | 7.1.0 -> 7.4.0 |
| kubernetes-model-batch | 7.1.0 -> 7.4.0 |
| kubernetes-model-certificates | 7.1.0 -> 7.4.0 |
| kubernetes-model-common | 7.1.0 -> 7.4.0 |
| kubernetes-model-coordination | 7.1.0 -> 7.4.0 |
| kubernetes-model-core | 7.1.0 -> 7.4.0 |
| kubernetes-model-discovery | 7.1.0 -> 7.4.0 |
| kubernetes-model-events | 7.1.0 -> 7.4.0 |
| kubernetes-model-extensions | 7.1.0 -> 7.4.0 |
| kubernetes-model-flowcontrol | 7.1.0 -> 7.4.0 |
| kubernetes-model-gatewayapi | 7.1.0 -> 7.4.0 |
| kubernetes-model-metrics | 7.1.0 -> 7.4.0 |
| kubernetes-model-networking | 7.1.0 -> 7.4.0 |
| kubernetes-model-node | 7.1.0 -> 7.4.0 |
| kubernetes-model-policy | 7.1.0 -> 7.4.0 |
| kubernetes-model-rbac | 7.1.0 -> 7.4.0 |
| kubernetes-model-resource | 7.1.0 -> 7.4.0 |
| kubernetes-model-scheduling | 7.1.0 -> 7.4.0 |
| kubernetes-model-storageclass | 7.1.0 -> 7.4.0 |
| lapack | 3.0.3 -> 3.0.4 |
| listenablefuture | 9999.0-empty-to-avoid-conflict-with-guava -> REMOVED |
| metrics-core | 4.2.30 -> 4.2.37 |
| metrics-graphite | 4.2.30 -> 4.2.37 |
| metrics-jmx | 4.2.30 -> 4.2.37 |
| metrics-json | 4.2.30 -> 4.2.37 |
| metrics-jvm | 4.2.30 -> 4.2.37 |
| netty-all | 4.1.118.Final -> 4.2.7.Final |
| netty-buffer | 4.1.118.Final -> 4.2.7.Final |
| netty-codec | 4.1.118.Final -> 4.2.7.Final |
| netty-codec-base | -> 4.2.7.Final (NEW) |
| netty-codec-classes-quic | -> 4.2.7.Final (NEW) |
| netty-codec-compression | -> 4.2.7.Final (NEW) |
| netty-codec-dns | 4.1.118.Final -> 4.2.7.Final |
| netty-codec-http | 4.1.118.Final -> 4.2.7.Final |
| netty-codec-http2 | 4.1.118.Final -> 4.2.7.Final |
| netty-codec-http3 | -> 4.2.7.Final (NEW) |
| netty-codec-marshalling | -> 4.2.7.Final (NEW) |
| netty-codec-native-quic | -> 4.2.7.Final (NEW) |
| netty-codec-protobuf | -> 4.2.7.Final (NEW) |
| netty-codec-socks | 4.1.118.Final -> 4.2.7.Final |
| netty-common | 4.1.118.Final -> 4.2.7.Final |
| netty-handler | 4.1.118.Final -> 4.2.7.Final |
| netty-handler-proxy | 4.1.118.Final -> 4.2.7.Final |
| netty-resolver | 4.1.118.Final -> 4.2.7.Final |
| netty-resolver-dns | 4.1.118.Final -> 4.2.7.Final |
| netty-tcnative-boringssl-static | 2.0.70.Final -> 2.0.74.Final |
| netty-tcnative-classes | 2.0.70.Final -> 2.0.74.Final |
| netty-transport | 4.1.118.Final -> 4.2.7.Final |
| netty-transport-classes-epoll | 4.1.118.Final -> 4.2.7.Final |
| netty-transport-classes-io_uring | -> 4.2.7.Final (NEW) |
| netty-transport-classes-kqueue | 4.1.118.Final -> 4.2.7.Final |
| netty-transport-native-epoll | 4.1.118.Final -> 4.2.7.Final |
| netty-transport-native-io_uring | -> 4.2.7.Final (NEW) |
| netty-transport-native-kqueue | 4.1.118.Final -> 4.2.7.Final |
| netty-transport-native-unix-common | 4.1.118.Final -> 4.2.7.Final |
| objenesis | 3.3 -> 3.4 |
| orc-core | 2.1.3 -> 2.2.1 |
| orc-mapreduce | 2.1.3 -> 2.2.1 |
| orc-shims | 2.1.3 -> 2.2.1 |
| paranamer | 2.8 -> 2.8.3 |
| parquet-column | 1.15.2 -> 1.16.0 |
| parquet-common | 1.15.2 -> 1.16.0 |
| parquet-encoding | 1.15.2 -> 1.16.0 |
| parquet-format-structures | 1.15.2 -> 1.16.0 |
| parquet-hadoop | 1.15.2 -> 1.16.0 |
| parquet-jackson | 1.15.2 -> 1.16.0 |
| scala-collection-compat | 2.7.0 -> REMOVED |
| scala-compiler | 2.13.16 -> 2.13.17 |
| scala-library | 2.13.16 -> 2.13.17 |
| scala-reflect | 2.13.16 -> 2.13.17 |
| scala-xml | 2.3.0 -> 2.4.0 |
| slf4j-api | 2.0.16 -> 2.0.17 |
| snakeyaml | 2.3 -> 2.4 |
| snakeyaml-engine | 2.9 -> 2.10 |
| snappy-java | 1.1.10.7 -> 1.1.10.8 |
| vertx-auth-common | 4.5.12 -> 4.5.14 |
| vertx-core | 4.5.12 -> 4.5.14 |
| vertx-web-client | 4.5.12 -> 4.5.14 |
| vertx-web-common | 4.5.12 -> 4.5.14 |
| xbean-asm9-shaded | 4.26 -> 4.28 |
| zjsonpatch | 7.1.0 -> 7.4.0 |
| zookeeper | 3.9.3 -> 3.9.4 |
| zookeeper-jute | 3.9.3 -> 3.9.4 |
| zstd-jni | 1.5.6-9 -> 1.5.7-6 |
Last but not least, this release would not have been possible without the following contributors: aakash-db (Aakash Japi), AbinayaJayaprakasam, ala (Ala Luszczak), aldenlau-db (Alden Lau), alekjarmov (Alek Jarmov), allisonwang-db (Allison Wang), amoghantarkar (Amogh Antarkar), andyl-db, AngersZhuuuu (Angerszhuuuu), AnishMahto, anishshri-db (Anish), anoopj (Anoop Johnson), antban (DS), anton5798 (Anton Lykov), aokolnychyi (Anton Okolnychyi), ashrithb (Ashrith Bandla), asl3 (Amanda Liu), atongpu, attilapiros (Attila Zsolt Piros), austinrwarner (Austin Warner), AveryQi115 (Avery), beliefer (Jiaan Geng), benrobby, bersprockets (Bruce Robbins), bjornjorgensen (Bjørn Jørgensen), bogao007 (Bo Gao), brkyvz (Burak Yavuz), calilisantos (Calili Santos), carlotran4 (Carlo Tran), cashmand (David Cashman), cboumalh (Chris Boumalhab), changgyoopark-db, chenhao-db, Chhida, chirag-s-db (Chirag Singh), cloud-fan (Wenchen Fan), cnauroth (Chris Nauroth), cookiedough77, craiuconstantintiberiu (Constantin-Tiberiu Craiu), cravani (Chiran Ravani), cty123 (cty), cxzl25, cyb70289 (Yibo Cai), davidm-db (David Milicevic), dengziming (dengziming), DenineLu (Deninelu), dillitz (Robert Dillitz), djspiewak (Daniel Spiewak), dongjoon-hyun (Dongjoon Hyun), drexler-sky, dtenedor (Daniel Tenedorio), dusantism-db (Dušan Tišma), dylanwong250, eason-yuchen-liu (Yuchen Liu), eejbyfeldt (Emil Ejbyfeldt), efaracci018, Emma-82, EnricoMi (Enrico Minack), EricGao888 (Eric Gao), ericm-db (Eric Marnadi), eschcam (Cameron), EugeneYushin (Eugen), fanyue-xia (Chloe Xia), fartzy (Mike Artz), fe2s (Oleksii Diagiliev), ForVic (Victor Sunderland), francesco-camaione (Francesco Camaione), fusheng9399 (fusheng), ganeshashree (Ganesha Shreedhara), gaogaotiantian (Tian Gao), gemelen (Denis Pyshev), gene-db (Gene Pang), gengliangwang (Gengliang Wang), gerashegalov (Gera Shegalov), gjxdxh (Lingkai Kong), grundprinzip (Martin Grund), haoyangeng-db, harshmotw-db (Harsh Motwani), HeartSaVioR (Jungtaek Lim), HendrikHuebner (Hendrik Hübner), heyihong (Yihong He), huangxiaopingRD (huangxiaoping), huanliwang-db (Huanli Wang), huaxingao (Huaxin Gao), hvanhovell (Herman van Hovell), HyukjinKwon (Hyukjin Kwon), ignitz (Yuri Niitsuma), ilicmarkodb (Marko Ilić), imarkowitz (Ian Markowitz), ishnagy (Ish Nagy), itholic (Haejoon Lee), ivoson (Tengfei Huang), jaceklaskowski (Jacek Laskowski), jackierwzhang, jackylee-ch (jackylee), james-willis (James Willis), jayantdb (Jayant Sharma), jerrypeng (Boyang Jerry Peng), JiaqiWang18 (Jacky Wang), jiateoh (Jason Teoh), JiexingLi, Jimvin (Jim Halfpenny), jingz-db (Jing Zhan), jinkachy (chenhongyu), jiwen624 (Eric Yang), jonathan-albrecht-ibm (Jonathan Albrecht), jonmio (Jon Mio), jonnycomes (Jonny Comes), jorenham (Joren Hammudoglu), JoshRosen (Josh Rosen), juliuszsompolski (Juliusz Sompolski), karuppayya (Karuppayya), kelvinjian-db (Kelvin Jiang), kepler62f, khakhlyuk (Alex Khakhlyuk), Kimahriman (Adam Binford), kirisakow (Kiril Isakov), ksbeyer, Last-remote11 (Sung Dong Kim), liuzqt (Ziqi Liu), liviazhu (Livia Zhu), liviazhu-db, longvu-db (Thang Long Vu), LucaCanali (Luca Canali), LuciferYang (YangJie), ManosGEM (Manolis Gemeliaris), manuzhang (Manu Zhang), max2718281 (Maxime Xu), MaxGekk (Maxim Gekk), mbrukman (Misha Brukman), micheal-o (Babatunde Micheal Okutubo), mihailoale-db (Mihailo Aleksic), mihailom-db, mihailotim-db (Mihailo Timotic), mikhailnik-db (Mikhail NIkoliukin), miland-db (Milan Dankovic), milastdbx (Milan Stefanovic), milosstojanovic (Milos Stojanovic), morvenhuang, mzhang (Matt Zhang), nagaarjun-p (Nagaarjun P), Ngone51 (wuyi), nija-at (Niranjan), niklasmohrin (Niklas Mohrin), nikola-jovicevic-db, Nishanth28, Pajaraja (Pavle Martinovic), pan3793 (Cheng Pan), panbingkun (panbingkun), pasar6987, PetarVasiljevic-DB, peter-toth (Peter Toth), petern48 (Peter Nguyen), peterpashkin, PHILO-HE, pjfanning (PJ Fanning), pranavdev022 (Pranav Dev), prathit06 (Prathit malik), qiyuandong-db (Qiyuan Dong), richardc-db, robreeves (Rob Reeves), RocMarshal (Yuepeng Pan), Rolfdv (Rolf de Vries), sandip-db (Sandip Agarwala), sarutak (Kousuke Saruta), SCHJonathan (Jonathan Chang), senthh, shardulm94 (Shardul Mahadik), shujingyang-db (Shujing Yang), sigmod (Yingyi Bu), siying (Siying Dong), srielau (Serge Rielau), sririshindra (Rishi), sryza (Sandy Ryza), stefankandic (Stefan Kandic), steveloughran (Steve Loughran), steven-aerts (Steven Aerts), stevomitric (Stevo Mitric), summaryzb (summaryzb), sunchao (Chao Sun), Surbhi-Vijay, szehon-ho (Szehon Ho), TeodorDjelic (Teodor Djelic), the-sakthi (Sakthi), thejdeep (Thejdeep Gudivada), timarmstrong (Tim Armstrong), tomscut (litao), TongWei1105 (TongWei), trsigg (Tynan Sigg), ueshin (Takuya UESHIN), uros-db (Uros Bojanic), uros7251brick, urosstan-db (Uros Stankovic), vanja-vujovic-db, vicennial (Venkata Sai Akhil Gudesa), viirya (Liang-Chi Hsieh), viktorluc-db (Viktor Lučić), VindhyaG, vinodkc (Vinod KC), vladimirg-db (Vladimir Golubev), vrmorusu (Vamshidhar Morusu), vrozov (Vlad Rozov), WangGuangxin, wangyum (Yuming Wang), wankunde (wankun), wayneguow (Wei Guo), wecharyu (Wechar Yu), WeichenXu123 (WeichenXu), wengh (Haoyu Weng), wForget (Zhen Wang), williamhyun (William Hyun), WweiL (Wei Liu), xi-db (Xi Lyu), xianzhe-databricks (Xianzhe Ma), xiaonanyang-db (Xiaonan Yang), xinrong-meng (Xinrong Meng), xu20160924 (John Xu), xupefei (Paddy Xu), xuyu-co, yaooqinn (Kent Yao), yeshengm (Yesheng Ma), yhuang-db (Yuchuan Huang), Yicong-Huang (Yicong Huang), yuexing (Yue), yumingxuanguo-db (Yumingxuan Guo), zecookiez (Zeyu Chen), zeruibao (Zerui Bao), zhengruifeng (Ruifeng Zheng), zhipengmao-db (Zhipeng Mao), zhixingheyi-tian, zhztheplayer (Hongze Zhang), zifeif2 (Zifei Feng), ZiyaZa (Ziya Mukhtarov), zml1206 (Mingliang Zhu)