public final class JavaKinesisWordCountASL
Java-friendly Kinesis Spark Streaming WordCount example
See http://spark.apache.org/docs/latest/streaming-kinesis.html for more details
on the Kinesis Spark Streaming integration.
This example spins up 1 Kinesis Worker (Spark Streaming Receiver) per shard
for the given stream.
It then starts pulling from the last checkpointed sequence number of the given
Valid endpoint urls: http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region
This code uses the DefaultAWSCredentialsProviderChain and searches for credentials
in the following order of precedence:
Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY
Java System Properties - aws.accessKeyId and aws.secretKey
Credential profiles file - default location (~/.aws/credentials) shared by all AWS SDKs
Instance profile credentials - delivered through the Amazon EC2 metadata service
Usage: JavaKinesisWordCountASL is the name of the Kinesis stream (ie. mySparkStream)
is the endpoint of the Kinesis service
$ export AWS_ACCESS_KEY_ID=
$ export AWS_SECRET_KEY=
$ $SPARK_HOME/bin/run-example \
org.apache.spark.examples.streaming.JavaKinesisWordCountASL mySparkStream \
Note that number of workers/threads should be 1 more than the number of receivers.
This leaves one thread available for actually processing the data.
There is a companion helper class called KinesisWordCountProducerASL which puts dummy data
onto the Kinesis stream.
Usage instructions for KinesisWordCountProducerASL are provided in the class definition.