org.apache.spark.api.java
Class JavaRDD<T>

Object
  extended by org.apache.spark.api.java.JavaRDD<T>
All Implemented Interfaces:
java.io.Serializable, JavaRDDLike<T,JavaRDD<T>>

public class JavaRDD<T>
extends Object

See Also:
Serialized Form

Constructor Summary
JavaRDD(RDD<T> rdd, scala.reflect.ClassTag<T> classTag)
           
 
Method Summary
 JavaRDD<T> cache()
          Persist this RDD with the default storage level (`MEMORY_ONLY`).
 scala.reflect.ClassTag<T> classTag()
           
 JavaRDD<T> coalesce(int numPartitions)
          Return a new RDD that is reduced into numPartitions partitions.
 JavaRDD<T> coalesce(int numPartitions, boolean shuffle)
          Return a new RDD that is reduced into numPartitions partitions.
 JavaRDD<T> distinct()
          Return a new RDD containing the distinct elements in this RDD.
 JavaRDD<T> distinct(int numPartitions)
          Return a new RDD containing the distinct elements in this RDD.
 JavaRDD<T> filter(Function<T,Boolean> f)
          Return a new RDD containing only the elements that satisfy a predicate.
static
<T> JavaRDD<T>
fromRDD(RDD<T> rdd, scala.reflect.ClassTag<T> evidence$1)
           
 JavaRDD<T> intersection(JavaRDD<T> other)
          Return the intersection of this RDD and another one.
 JavaRDD<T> persist(StorageLevel newLevel)
          Set this RDD's storage level to persist its values across operations after the first time it is computed.
 JavaRDD<T>[] randomSplit(double[] weights)
          Randomly splits this RDD with the provided weights.
 JavaRDD<T>[] randomSplit(double[] weights, long seed)
          Randomly splits this RDD with the provided weights.
 RDD<T> rdd()
           
 JavaRDD<T> repartition(int numPartitions)
          Return a new RDD that has exactly numPartitions partitions.
 JavaRDD<T> sample(boolean withReplacement, double fraction)
          Return a sampled subset of this RDD.
 JavaRDD<T> sample(boolean withReplacement, double fraction, long seed)
          Return a sampled subset of this RDD.
 JavaRDD<T> setName(String name)
          Assign a name to this RDD
<S> JavaRDD<T>
sortBy(Function<T,S> f, boolean ascending, int numPartitions)
          Return this RDD sorted by the given key function.
 JavaRDD<T> subtract(JavaRDD<T> other)
          Return an RDD with the elements from this that are not in other.
 JavaRDD<T> subtract(JavaRDD<T> other, int numPartitions)
          Return an RDD with the elements from this that are not in other.
 JavaRDD<T> subtract(JavaRDD<T> other, Partitioner p)
          Return an RDD with the elements from this that are not in other.
static
<T> RDD<T>
toRDD(JavaRDD<T> rdd)
           
 String toString()
           
 JavaRDD<T> union(JavaRDD<T> other)
          Return the union of this RDD and another one.
 JavaRDD<T> unpersist()
          Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
 JavaRDD<T> unpersist(boolean blocking)
          Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
 JavaRDD<T> wrapRDD(RDD<T> rdd)
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.spark.api.java.JavaRDDLike
aggregate, cartesian, checkpoint, collect, collectAsync, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitions, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, splits, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, treeAggregate, treeAggregate, treeReduce, treeReduce, zip, zipPartitions, zipWithIndex, zipWithUniqueId
 

Constructor Detail

JavaRDD

public JavaRDD(RDD<T> rdd,
               scala.reflect.ClassTag<T> classTag)
Method Detail

fromRDD

public static <T> JavaRDD<T> fromRDD(RDD<T> rdd,
                                     scala.reflect.ClassTag<T> evidence$1)

toRDD

public static <T> RDD<T> toRDD(JavaRDD<T> rdd)

rdd

public RDD<T> rdd()

classTag

public scala.reflect.ClassTag<T> classTag()

wrapRDD

public JavaRDD<T> wrapRDD(RDD<T> rdd)

cache

public JavaRDD<T> cache()
Persist this RDD with the default storage level (`MEMORY_ONLY`).


persist

public JavaRDD<T> persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet..

Parameters:
newLevel - (undocumented)
Returns:
(undocumented)

unpersist

public JavaRDD<T> unpersist()
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk. This method blocks until all blocks are deleted.

Returns:
(undocumented)

unpersist

public JavaRDD<T> unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

Parameters:
blocking - Whether to block until all blocks are deleted.
Returns:
(undocumented)

distinct

public JavaRDD<T> distinct()
Return a new RDD containing the distinct elements in this RDD.

Returns:
(undocumented)

distinct

public JavaRDD<T> distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.

Parameters:
numPartitions - (undocumented)
Returns:
(undocumented)

filter

public JavaRDD<T> filter(Function<T,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.

Parameters:
f - (undocumented)
Returns:
(undocumented)

coalesce

public JavaRDD<T> coalesce(int numPartitions)
Return a new RDD that is reduced into numPartitions partitions.

Parameters:
numPartitions - (undocumented)
Returns:
(undocumented)

coalesce

public JavaRDD<T> coalesce(int numPartitions,
                           boolean shuffle)
Return a new RDD that is reduced into numPartitions partitions.

Parameters:
numPartitions - (undocumented)
shuffle - (undocumented)
Returns:
(undocumented)

repartition

public JavaRDD<T> repartition(int numPartitions)
Return a new RDD that has exactly numPartitions partitions.

Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.

If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.

Parameters:
numPartitions - (undocumented)
Returns:
(undocumented)

sample

public JavaRDD<T> sample(boolean withReplacement,
                         double fraction)
Return a sampled subset of this RDD.

Parameters:
withReplacement - can elements be sampled multiple times (replaced when sampled out)
fraction - expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be >= 0
Returns:
(undocumented)

sample

public JavaRDD<T> sample(boolean withReplacement,
                         double fraction,
                         long seed)
Return a sampled subset of this RDD.

Parameters:
withReplacement - can elements be sampled multiple times (replaced when sampled out)
fraction - expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be >= 0
seed - seed for the random number generator
Returns:
(undocumented)

randomSplit

public JavaRDD<T>[] randomSplit(double[] weights)
Randomly splits this RDD with the provided weights.

Parameters:
weights - weights for splits, will be normalized if they don't sum to 1

Returns:
split RDDs in an array

randomSplit

public JavaRDD<T>[] randomSplit(double[] weights,
                                long seed)
Randomly splits this RDD with the provided weights.

Parameters:
weights - weights for splits, will be normalized if they don't sum to 1
seed - random seed

Returns:
split RDDs in an array

union

public JavaRDD<T> union(JavaRDD<T> other)
Return the union of this RDD and another one. Any identical elements will appear multiple times (use .distinct() to eliminate them).

Parameters:
other - (undocumented)
Returns:
(undocumented)

intersection

public JavaRDD<T> intersection(JavaRDD<T> other)
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.

Note that this method performs a shuffle internally.

Parameters:
other - (undocumented)
Returns:
(undocumented)

subtract

public JavaRDD<T> subtract(JavaRDD<T> other)
Return an RDD with the elements from this that are not in other.

Uses this partitioner/partition size, because even if other is huge, the resulting RDD will be <= us.

Parameters:
other - (undocumented)
Returns:
(undocumented)

subtract

public JavaRDD<T> subtract(JavaRDD<T> other,
                           int numPartitions)
Return an RDD with the elements from this that are not in other.

Parameters:
other - (undocumented)
numPartitions - (undocumented)
Returns:
(undocumented)

subtract

public JavaRDD<T> subtract(JavaRDD<T> other,
                           Partitioner p)
Return an RDD with the elements from this that are not in other.

Parameters:
other - (undocumented)
p - (undocumented)
Returns:
(undocumented)

toString

public String toString()
Overrides:
toString in class Object

setName

public JavaRDD<T> setName(String name)
Assign a name to this RDD


sortBy

public <S> JavaRDD<T> sortBy(Function<T,S> f,
                             boolean ascending,
                             int numPartitions)
Return this RDD sorted by the given key function.

Parameters:
f - (undocumented)
ascending - (undocumented)
numPartitions - (undocumented)
Returns:
(undocumented)