public class JavaRDD<T> extends Object implements JavaRDDLike<T,JavaRDD<T>>
Constructor and Description |
---|
JavaRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> classTag) |
Modifier and Type | Method and Description |
---|---|
JavaRDD<T> |
cache()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
scala.reflect.ClassTag<T> |
classTag() |
JavaRDD<T> |
coalesce(int numPartitions)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
filter(Function<T,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
static <T> JavaRDD<T> |
fromRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> evidence$1) |
JavaRDD<T> |
intersection(JavaRDD<T> other)
Return the intersection of this RDD and another one.
|
JavaRDD<T> |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
JavaRDD<T>[] |
randomSplit(double[] weights)
Randomly splits this RDD with the provided weights.
|
JavaRDD<T>[] |
randomSplit(double[] weights,
long seed)
Randomly splits this RDD with the provided weights.
|
RDD<T> |
rdd() |
JavaRDD<T> |
repartition(int numPartitions)
Return a new RDD that has exactly numPartitions partitions.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction)
Return a sampled subset of this RDD.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction,
long seed)
Return a sampled subset of this RDD.
|
JavaRDD<T> |
setName(String name)
Assign a name to this RDD
|
<S> JavaRDD<T> |
sortBy(Function<T,S> f,
boolean ascending,
int numPartitions)
Return this RDD sorted by the given key function.
|
JavaRDD<T> |
subtract(JavaRDD<T> other)
Return an RDD with the elements from
this that are not in other . |
JavaRDD<T> |
subtract(JavaRDD<T> other,
int numPartitions)
Return an RDD with the elements from
this that are not in other . |
JavaRDD<T> |
subtract(JavaRDD<T> other,
Partitioner p)
Return an RDD with the elements from
this that are not in other . |
static <T> RDD<T> |
toRDD(JavaRDD<T> rdd) |
String |
toString() |
JavaRDD<T> |
union(JavaRDD<T> other)
Return the union of this RDD and another one.
|
JavaRDD<T> |
unpersist()
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
wrapRDD(RDD<T> rdd) |
aggregate, cartesian, checkpoint, collect, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachPartition, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitions, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, splits, take, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, zip, zipPartitions, zipWithIndex, zipWithUniqueId
public scala.reflect.ClassTag<T> classTag()
classTag
in interface JavaRDDLike<T,JavaRDD<T>>
public JavaRDD<T> wrapRDD(RDD<T> rdd)
wrapRDD
in interface JavaRDDLike<T,JavaRDD<T>>
public JavaRDD<T> persist(StorageLevel newLevel)
public JavaRDD<T> unpersist()
public JavaRDD<T> unpersist(boolean blocking)
blocking
- Whether to block until all blocks are deleted.public JavaRDD<T> distinct()
public JavaRDD<T> distinct(int numPartitions)
public JavaRDD<T> filter(Function<T,Boolean> f)
public JavaRDD<T> coalesce(int numPartitions)
numPartitions
partitions.public JavaRDD<T> coalesce(int numPartitions, boolean shuffle)
numPartitions
partitions.public JavaRDD<T> repartition(int numPartitions)
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce
,
which can avoid performing a shuffle.
public JavaRDD<T> sample(boolean withReplacement, double fraction)
public JavaRDD<T> sample(boolean withReplacement, double fraction, long seed)
public JavaRDD<T>[] randomSplit(double[] weights)
weights
- weights for splits, will be normalized if they don't sum to 1
public JavaRDD<T>[] randomSplit(double[] weights, long seed)
weights
- weights for splits, will be normalized if they don't sum to 1seed
- random seed
public JavaRDD<T> union(JavaRDD<T> other)
.distinct()
to eliminate them).public JavaRDD<T> intersection(JavaRDD<T> other)
Note that this method performs a shuffle internally.
public JavaRDD<T> subtract(JavaRDD<T> other)
this
that are not in other
.
Uses this
partitioner/partition size, because even if other
is huge, the resulting
RDD will be <= us.
public JavaRDD<T> subtract(JavaRDD<T> other, int numPartitions)
this
that are not in other
.public JavaRDD<T> subtract(JavaRDD<T> other, Partitioner p)
this
that are not in other
.public String toString()
toString
in class Object