Testing PySpark#

In order to run PySpark tests, you should build Spark itself first via Maven or SBT. For example,

build/mvn -DskipTests clean package
build/sbt -Phive clean package

After that, the PySpark test cases can be run via using python/run-tests. For example,

python/run-tests --python-executable=python3

Note that you may set OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable to YES if you are running tests on Mac OS.

Please see the guidance on how to Building Spark, run tests for a module, or individual tests.

Running Individual PySpark Tests#

You can run a specific test via using python/run-tests, for example, as below:

python/run-tests --testnames pyspark.sql.tests.test_arrow

Please refer to Testing PySpark for more details.

Running Tests using GitHub Actions#

You can run the full PySpark tests by using GitHub Actions in your own forked GitHub repository with a few clicks. Please refer to Running tests in your forked repository using GitHub Actions for more details.

Running Tests for Spark Connect#

Running Tests for Python Client#

In order to test the changes in Protobuf definitions, for example, at spark/sql/connect/common/src/main/protobuf/spark/connect, you should regenerate Python Protobuf client first by running dev/connect-gen-protos.sh.

Running PySpark Shell with Python Client#

The command below starts Spark Connect server automatically locally, and creates a Spark Connect client connected to the server.

bin/pyspark --remote "local[*]"