In order to run PySpark tests, you should build Spark itself first via Maven or SBT. For example,
build/mvn -DskipTests clean package
build/sbt -Phive clean package
After that, the PySpark test cases can be run via using
python/run-tests. For example,
Note that you may set
OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable to
YES if you are running tests on Mac OS.
Running Individual PySpark Tests¶
You can run a specific test via using
python/run-tests, for example, as below:
python/run-tests --testnames pyspark.sql.tests.test_arrow
Please refer to Testing PySpark for more details.
Running Tests using GitHub Actions¶
You can run the full PySpark tests by using GitHub Actions in your own forked GitHub repository with a few clicks. Please refer to Running tests in your forked repository using GitHub Actions for more details.
Running Tests for Spark Connect¶
Running Tests for Python Client¶
In order to test the changes in Protobuf definitions, for example, at
you should regenerate Python Protobuf client first by running
Running PySpark Shell with Python Client¶
For Apache Spark you locally built:
bin/pyspark --remote "local[*]"
For the Apache Spark release:
bin/pyspark --remote "local[*]" --packages org.apache.spark:spark-connect_2.12:3.4.0