Counts words in text encoded with UTF8 received from the network every second.
Usage: NetworkWordCount <master> <hostname> <port> <checkpoint-directory> <output-file>
<master> is the Spark master URL. In local mode, <master> should be 'local[n]' with n > 1.
<hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
<checkpoint-directory> directory to HDFS-compatible file system which checkpoint data
<output-file> file to which the word counts will be appended
In local mode, <master> should be 'local[n]' with n > 1
<checkpoint-directory> and <output-file> must be absolute paths
To run this on your local machine, you need to first run a Netcat server
If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create
a new StreamingContext (will print "Creating new context" to the console). Otherwise, if
checkpoint data exists in ~/checkpoint/, then it will create StreamingContext from
the checkpoint data.
To run this example in a local standalone cluster with automatic driver recovery,
Counts words in text encoded with UTF8 received from the network every second.
Usage: NetworkWordCount <master> <hostname> <port> <checkpoint-directory> <output-file> <master> is the Spark master URL. In local mode, <master> should be 'local[n]' with n > 1. <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data. <checkpoint-directory> directory to HDFS-compatible file system which checkpoint data <output-file> file to which the word counts will be appended
In local mode, <master> should be 'local[n]' with n > 1 <checkpoint-directory> and <output-file> must be absolute paths
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and run the example as
$ ./run-example org.apache.spark.streaming.examples.RecoverableNetworkWordCount \ local[2] localhost 9999 ~/checkpoint/ ~/out
If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create a new StreamingContext (will print "Creating new context" to the console). Otherwise, if checkpoint data exists in ~/checkpoint/, then it will create StreamingContext from the checkpoint data.
To run this example in a local standalone cluster with automatic driver recovery,
$ ./spark-class org.apache.spark.deploy.Client -s launch <cluster-url> <path-to-examples-jar> \ org.apache.spark.streaming.examples.RecoverableNetworkWordCount <cluster-url> \ localhost 9999 ~/checkpoint ~/out
<path-to-examples-jar> would typically be <spark-dir>/examples/target/scala-XX/spark-examples....jar
Refer to the online documentation for more details.