JavaRDD (Spark 3.5.3 JavaDoc)

Object
- org.apache.spark.api.java.JavaRDD<T>

All Implemented Interfaces:

java.io.Serializable, JavaRDDLike<T,JavaRDD<T>>
```
public class JavaRDD<T>
extends Object
```
See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

JavaRDD(RDD<T> rdd, scala.reflect.ClassTag<T> classTag)

Constructors
Constructor and Description
`JavaRDD(RDD<T> rdd, scala.reflect.ClassTag<T> classTag)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`JavaRDD<T>`	`cache()` Persist this RDD with the default storage level (`MEMORY_ONLY`).
`scala.reflect.ClassTag<T>`	`classTag()`
`JavaRDD<T>`	`coalesce(int numPartitions)` Return a new RDD that is reduced into `numPartitions` partitions.
`JavaRDD<T>`	`coalesce(int numPartitions, boolean shuffle)` Return a new RDD that is reduced into `numPartitions` partitions.
`JavaRDD<T>`	`distinct()` Return a new RDD containing the distinct elements in this RDD.
`JavaRDD<T>`	`distinct(int numPartitions)` Return a new RDD containing the distinct elements in this RDD.
`JavaRDD<T>`	`filter(Function<T,Boolean> f)` Return a new RDD containing only the elements that satisfy a predicate.
`static <T> JavaRDD<T>`	`fromRDD(RDD<T> rdd, scala.reflect.ClassTag<T> evidence$1)`
`ResourceProfile`	`getResourceProfile()` Get the ResourceProfile specified with this RDD or None if it wasn't specified.
`JavaRDD<T>`	`intersection(JavaRDD<T> other)` Return the intersection of this RDD and another one.
`JavaRDD<T>`	`persist(StorageLevel newLevel)` Set this RDD's storage level to persist its values across operations after the first time it is computed.
`JavaRDD<T>[]`	`randomSplit(double[] weights)` Randomly splits this RDD with the provided weights.
`JavaRDD<T>[]`	`randomSplit(double[] weights, long seed)` Randomly splits this RDD with the provided weights.
`RDD<T>`	`rdd()`
`JavaRDD<T>`	`repartition(int numPartitions)` Return a new RDD that has exactly numPartitions partitions.
`JavaRDD<T>`	`sample(boolean withReplacement, double fraction)` Return a sampled subset of this RDD with a random seed.
`JavaRDD<T>`	`sample(boolean withReplacement, double fraction, long seed)` Return a sampled subset of this RDD, with a user-supplied seed.
`JavaRDD<T>`	`setName(String name)` Assign a name to this RDD
`<S> JavaRDD<T>`	`sortBy(Function<T,S> f, boolean ascending, int numPartitions)` Return this RDD sorted by the given key function.
`JavaRDD<T>`	`subtract(JavaRDD<T> other)` Return an RDD with the elements from `this` that are not in `other`.
`JavaRDD<T>`	`subtract(JavaRDD<T> other, int numPartitions)` Return an RDD with the elements from `this` that are not in `other`.
`JavaRDD<T>`	`subtract(JavaRDD<T> other, Partitioner p)` Return an RDD with the elements from `this` that are not in `other`.
`static <T> RDD<T>`	`toRDD(JavaRDD<T> rdd)`
`String`	`toString()`
`JavaRDD<T>`	`union(JavaRDD<T> other)` Return the union of this RDD and another one.
`JavaRDD<T>`	`unpersist()` Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
`JavaRDD<T>`	`unpersist(boolean blocking)` Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
`JavaRDD<T>`	`withResources(ResourceProfile rp)` Specify a ResourceProfile to use when calculating this RDD.
`JavaRDD<T>`	`wrapRDD(RDD<T> rdd)`

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.api.java.JavaRDDLike
aggregate, cartesian, checkpoint, collect, collectAsync, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitioner, partitions, pipe, pipe, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, take, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toDebugString, toLocalIterator, top, top, treeAggregate, treeAggregate, treeAggregate, treeReduce, treeReduce, zip, zipPartitions, zipWithIndex, zipWithUniqueId

- Constructor Detail
  - JavaRDD
```
public JavaRDD(RDD<T> rdd,
               scala.reflect.ClassTag<T> classTag)
```
- Method Detail
  - fromRDD
```
public static <T> JavaRDD<T> fromRDD(RDD<T> rdd,
                                     scala.reflect.ClassTag<T> evidence$1)
```
  - toRDD
```
public static <T> RDD<T> toRDD(JavaRDD<T> rdd)
```
  - rdd
```
public RDD<T> rdd()
```
  - classTag
```
public scala.reflect.ClassTag<T> classTag()
```
  - wrapRDD
```
public JavaRDD<T> wrapRDD(RDD<T> rdd)
```
  - cache
```
public JavaRDD<T> cache()
```
    Persist this RDD with the default storage level (MEMORY_ONLY).
    
    Returns:
    
    (undocumented)
  - persist
```
public JavaRDD<T> persist(StorageLevel newLevel)
```
    Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet..
    
    Parameters:
    
    newLevel - (undocumented)
    
    Returns:
    
    (undocumented)
  - withResources
```
public JavaRDD<T> withResources(ResourceProfile rp)
```
    Specify a ResourceProfile to use when calculating this RDD. This is only supported on certain cluster managers and currently requires dynamic allocation to be enabled. It will result in new executors with the resources specified being acquired to calculate the RDD.
    
    Parameters:
    
    rp - (undocumented)
    
    Returns:
    
    (undocumented)
  - getResourceProfile
```
public ResourceProfile getResourceProfile()
```
    Get the ResourceProfile specified with this RDD or None if it wasn't specified.
    
    Returns:
    
    the user specified ResourceProfile or null if none was specified
  - unpersist
```
public JavaRDD<T> unpersist()
```
    Mark the RDD as non-persistent, and remove all blocks for it from memory and disk. This method blocks until all blocks are deleted.
    
    Returns:
    
    (undocumented)
  - unpersist
```
public JavaRDD<T> unpersist(boolean blocking)
```
    Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
    
    Parameters:
    
    blocking - Whether to block until all blocks are deleted.
    
    Returns:
    
    (undocumented)
  - distinct
```
public JavaRDD<T> distinct()
```
    Return a new RDD containing the distinct elements in this RDD.
    
    Returns:
    
    (undocumented)
  - distinct
```
public JavaRDD<T> distinct(int numPartitions)
```
    Return a new RDD containing the distinct elements in this RDD.
    
    Parameters:
    
    numPartitions - (undocumented)
    
    Returns:
    
    (undocumented)
  - filter
```
public JavaRDD<T> filter(Function<T,Boolean> f)
```
    Return a new RDD containing only the elements that satisfy a predicate.
    
    Parameters:
    
    f - (undocumented)
    
    Returns:
    
    (undocumented)
  - coalesce
```
public JavaRDD<T> coalesce(int numPartitions)
```
    Return a new RDD that is reduced into numPartitions partitions.
    
    Parameters:
    
    numPartitions - (undocumented)
    
    Returns:
    
    (undocumented)
  - coalesce
```
public JavaRDD<T> coalesce(int numPartitions,
                           boolean shuffle)
```
    Return a new RDD that is reduced into numPartitions partitions.
    
    Parameters:
    
    numPartitions - (undocumented)
    
    shuffle - (undocumented)
    
    Returns:
    
    (undocumented)
  - repartition
```
public JavaRDD<T> repartition(int numPartitions)
```
    Return a new RDD that has exactly numPartitions partitions.
    Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
    If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.
    
    Parameters:
    
    numPartitions - (undocumented)
    
    Returns:
    
    (undocumented)
  - sample
```
public JavaRDD<T> sample(boolean withReplacement,
                         double fraction)
```
    Return a sampled subset of this RDD with a random seed.
    
    Parameters:
    
    withReplacement - can elements be sampled multiple times (replaced when sampled out)
    
    fraction - expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be greater than or equal to 0
    
    Returns:
    
    (undocumented)
    
    Note:
    
    This is NOT guaranteed to provide exactly the fraction of the count of the given RDD.
  - sample
```
public JavaRDD<T> sample(boolean withReplacement,
                         double fraction,
                         long seed)
```
    Return a sampled subset of this RDD, with a user-supplied seed.
    
    Parameters:
    
    withReplacement - can elements be sampled multiple times (replaced when sampled out)
    
    fraction - expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be greater than or equal to 0
    
    seed - seed for the random number generator
    
    Returns:
    
    (undocumented)
    
    Note:
    
    This is NOT guaranteed to provide exactly the fraction of the count of the given RDD.
  - randomSplit
```
public JavaRDD<T>[] randomSplit(double[] weights)
```
    Randomly splits this RDD with the provided weights.
    
    Parameters:
    
    weights - weights for splits, will be normalized if they don't sum to 1
    
    Returns:
    
    split RDDs in an array
  - randomSplit
```
public JavaRDD<T>[] randomSplit(double[] weights,
                                long seed)
```
    Randomly splits this RDD with the provided weights.
    
    Parameters:
    
    weights - weights for splits, will be normalized if they don't sum to 1
    
    seed - random seed
    
    Returns:
    
    split RDDs in an array
  - union
```
public JavaRDD<T> union(JavaRDD<T> other)
```
    Return the union of this RDD and another one. Any identical elements will appear multiple times (use .distinct() to eliminate them).
    
    Parameters:
    
    other - (undocumented)
    
    Returns:
    
    (undocumented)
  - intersection
```
public JavaRDD<T> intersection(JavaRDD<T> other)
```
    Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
    
    Parameters:
    
    other - (undocumented)
    
    Returns:
    
    (undocumented)
    
    Note:
    
    This method performs a shuffle internally.
  - subtract
```
public JavaRDD<T> subtract(JavaRDD<T> other)
```
    Return an RDD with the elements from this that are not in other.
    Uses this partitioner/partition size, because even if other is huge, the resulting RDD will be less than or equal to us.
    
    Parameters:
    
    other - (undocumented)
    
    Returns:
    
    (undocumented)
  - subtract
```
public JavaRDD<T> subtract(JavaRDD<T> other,
                           int numPartitions)
```
    Return an RDD with the elements from this that are not in other.
    
    Parameters:
    
    other - (undocumented)
    
    numPartitions - (undocumented)
    
    Returns:
    
    (undocumented)
  - subtract
```
public JavaRDD<T> subtract(JavaRDD<T> other,
                           Partitioner p)
```
    Return an RDD with the elements from this that are not in other.
    
    Parameters:
    
    other - (undocumented)
    
    p - (undocumented)
    
    Returns:
    
    (undocumented)
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
  - setName
```
public JavaRDD<T> setName(String name)
```
    Assign a name to this RDD
  - sortBy
```
public <S> JavaRDD<T> sortBy(Function<T,S> f,
                             boolean ascending,
                             int numPartitions)
```
    Return this RDD sorted by the given key function.
    
    Parameters:
    
    f - (undocumented)
    
    ascending - (undocumented)
    
    numPartitions - (undocumented)
    
    Returns:
    
    (undocumented)

Class JavaRDD<T>

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.api.java.JavaRDDLike

Constructor Detail

JavaRDD

Method Detail

fromRDD

toRDD

rdd

classTag

wrapRDD

cache

persist

withResources

getResourceProfile

unpersist

unpersist

distinct

distinct

filter

coalesce

coalesce

repartition

sample

sample

randomSplit

randomSplit

union

intersection

subtract

subtract

subtract

toString

setName

sortBy