public class JavaRDD<T>
extends Object
| Constructor and Description |
|---|
JavaRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> classTag) |
| Modifier and Type | Method and Description |
|---|---|
JavaRDD<T> |
cache()
Persist this RDD with the default storage level (
MEMORY_ONLY). |
scala.reflect.ClassTag<T> |
classTag() |
JavaRDD<T> |
coalesce(int numPartitions)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaRDD<T> |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaRDD<T> |
filter(Function<T,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
static <T> JavaRDD<T> |
fromRDD(RDD<T> rdd,
scala.reflect.ClassTag<T> evidence$1) |
ResourceProfile |
getResourceProfile()
Get the ResourceProfile specified with this RDD or None if it wasn't specified.
|
JavaRDD<T> |
intersection(JavaRDD<T> other)
Return the intersection of this RDD and another one.
|
JavaRDD<T> |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
JavaRDD<T>[] |
randomSplit(double[] weights)
Randomly splits this RDD with the provided weights.
|
JavaRDD<T>[] |
randomSplit(double[] weights,
long seed)
Randomly splits this RDD with the provided weights.
|
RDD<T> |
rdd() |
JavaRDD<T> |
repartition(int numPartitions)
Return a new RDD that has exactly numPartitions partitions.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction)
Return a sampled subset of this RDD with a random seed.
|
JavaRDD<T> |
sample(boolean withReplacement,
double fraction,
long seed)
Return a sampled subset of this RDD, with a user-supplied seed.
|
JavaRDD<T> |
setName(String name)
Assign a name to this RDD
|
<S> JavaRDD<T> |
sortBy(Function<T,S> f,
boolean ascending,
int numPartitions)
Return this RDD sorted by the given key function.
|
JavaRDD<T> |
subtract(JavaRDD<T> other)
Return an RDD with the elements from
this that are not in other. |
JavaRDD<T> |
subtract(JavaRDD<T> other,
int numPartitions)
Return an RDD with the elements from
this that are not in other. |
JavaRDD<T> |
subtract(JavaRDD<T> other,
Partitioner p)
Return an RDD with the elements from
this that are not in other. |
static <T> RDD<T> |
toRDD(JavaRDD<T> rdd) |
String |
toString() |
JavaRDD<T> |
union(JavaRDD<T> other)
Return the union of this RDD and another one.
|
JavaRDD<T> |
unpersist()
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<T> |
withResources(ResourceProfile rp)
Specify a ResourceProfile to use when calculating this RDD.
|
JavaRDD<T> |
wrapRDD(RDD<T> rdd) |
aggregate, cartesian, checkpoint, collect, collectAsync, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitioner, partitions, pipe, pipe, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toDebugString, toLocalIterator, top, top, treeAggregate, treeAggregate, treeAggregate, treeReduce, treeReduce, zip, zipPartitions, zipWithIndex, zipWithUniqueIdpublic scala.reflect.ClassTag<T> classTag()
public JavaRDD<T> cache()
MEMORY_ONLY).public JavaRDD<T> persist(StorageLevel newLevel)
newLevel - (undocumented)public JavaRDD<T> withResources(ResourceProfile rp)
rp - (undocumented)public ResourceProfile getResourceProfile()
public JavaRDD<T> unpersist()
public JavaRDD<T> unpersist(boolean blocking)
blocking - Whether to block until all blocks are deleted.public JavaRDD<T> distinct()
public JavaRDD<T> distinct(int numPartitions)
numPartitions - (undocumented)public JavaRDD<T> filter(Function<T,Boolean> f)
f - (undocumented)public JavaRDD<T> coalesce(int numPartitions)
numPartitions partitions.numPartitions - (undocumented)public JavaRDD<T> coalesce(int numPartitions, boolean shuffle)
numPartitions partitions.numPartitions - (undocumented)shuffle - (undocumented)public JavaRDD<T> repartition(int numPartitions)
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce,
which can avoid performing a shuffle.
numPartitions - (undocumented)public JavaRDD<T> sample(boolean withReplacement, double fraction)
withReplacement - can elements be sampled multiple times (replaced when sampled out)fraction - expected size of the sample as a fraction of this RDD's size
without replacement: probability that each element is chosen; fraction must be [0, 1]
with replacement: expected number of times each element is chosen; fraction must be greater
than or equal to 0
RDD.public JavaRDD<T> sample(boolean withReplacement, double fraction, long seed)
withReplacement - can elements be sampled multiple times (replaced when sampled out)fraction - expected size of the sample as a fraction of this RDD's size
without replacement: probability that each element is chosen; fraction must be [0, 1]
with replacement: expected number of times each element is chosen; fraction must be greater
than or equal to 0seed - seed for the random number generator
RDD.public JavaRDD<T>[] randomSplit(double[] weights)
weights - weights for splits, will be normalized if they don't sum to 1
public JavaRDD<T>[] randomSplit(double[] weights, long seed)
weights - weights for splits, will be normalized if they don't sum to 1seed - random seed
public JavaRDD<T> union(JavaRDD<T> other)
.distinct() to eliminate them).other - (undocumented)public JavaRDD<T> intersection(JavaRDD<T> other)
other - (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other)
this that are not in other.
Uses this partitioner/partition size, because even if other is huge, the resulting
RDD will be less than or equal to us.
other - (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other, int numPartitions)
this that are not in other.other - (undocumented)numPartitions - (undocumented)public JavaRDD<T> subtract(JavaRDD<T> other, Partitioner p)
this that are not in other.other - (undocumented)p - (undocumented)public String toString()
toString in class Object