Package org.apache.spark.launcher


package org.apache.spark.launcher
Library for launching Spark applications programmatically.

There are two ways to start applications with this library: as a child process, using SparkLauncher, or in-process, using InProcessLauncher.

The AbstractLauncher.startApplication(org.apache.spark.launcher.SparkAppHandle.Listener...) method can be used to start Spark and provide a handle to monitor and control the running application:

 
   import org.apache.spark.launcher.SparkAppHandle;
   import org.apache.spark.launcher.SparkLauncher;

   public class MyLauncher {
     public static void main(String[] args) throws Exception {
       SparkAppHandle handle = new SparkLauncher()
         .setAppResource("/my/app.jar")
         .setMainClass("my.spark.app.Main")
         .setMaster("local")
         .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
         .startApplication();
       // Use handle API to monitor / control application.
     }
   }
 
 

Launching applications as a child process requires a full Spark installation. The installation directory can be provided to the launcher explicitly in the launcher's configuration, or by setting the SPARK_HOME environment variable.

Launching applications in-process is only recommended in cluster mode, since Spark cannot run multiple client-mode applications concurrently in the same process. The in-process launcher requires the necessary Spark dependencies (such as spark-core and cluster manager-specific modules) to be present in the caller thread's class loader.

It's also possible to launch a raw child process, without the extra monitoring, using the SparkLauncher.launch() method:

 
   import org.apache.spark.launcher.SparkLauncher;

   public class MyLauncher {
     public static void main(String[] args) throws Exception {
       Process spark = new SparkLauncher()
         .setAppResource("/my/app.jar")
         .setMainClass("my.spark.app.Main")
         .setMaster("local")
         .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
         .launch();
       spark.waitFor();
     }
   }
 
 

This method requires the calling code to manually manage the child process, including its output streams (to avoid possible deadlocks). It's recommended that SparkLauncher.startApplication(org.apache.spark.launcher.SparkAppHandle.Listener...) be used instead.