public interface JavaRDDLike<T,This extends JavaRDDLike<T,This>>
extends scala.Serializable
Modifier and Type | Method and Description |
---|---|
<U> U |
aggregate(U zeroValue,
Function2<U,T,U> seqOp,
Function2<U,U,U> combOp)
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
|
<U> JavaPairRDD<T,U> |
cartesian(JavaRDDLike<U,?> other)
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in
this and b is in other . |
void |
checkpoint()
Mark this RDD for checkpointing.
|
scala.reflect.ClassTag<T> |
classTag() |
java.util.List<T> |
collect()
Return an array that contains all of the elements in this RDD.
|
JavaFutureAction<java.util.List<T>> |
collectAsync()
The asynchronous version of
collect , which returns a future for
retrieving an array containing all of the elements in this RDD. |
java.util.List<T>[] |
collectPartitions(int[] partitionIds)
Return an array that contains all of the elements in a specific partition of this RDD.
|
SparkContext |
context()
The
SparkContext that this RDD was created on. |
long |
count()
Return the number of elements in the RDD.
|
PartialResult<BoundedDouble> |
countApprox(long timeout)
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
|
PartialResult<BoundedDouble> |
countApprox(long timeout,
double confidence)
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
|
long |
countApproxDistinct(double relativeSD)
Return approximate number of distinct elements in the RDD.
|
JavaFutureAction<Long> |
countAsync()
The asynchronous version of
count , which returns a
future for counting the number of elements in this RDD. |
java.util.Map<T,Long> |
countByValue()
Return the count of each unique value in this RDD as a map of (value, count) pairs.
|
PartialResult<java.util.Map<T,BoundedDouble>> |
countByValueApprox(long timeout)
(Experimental) Approximate version of countByValue().
|
PartialResult<java.util.Map<T,BoundedDouble>> |
countByValueApprox(long timeout,
double confidence)
(Experimental) Approximate version of countByValue().
|
T |
first()
Return the first element in this RDD.
|
<U> JavaRDD<U> |
flatMap(FlatMapFunction<T,U> f)
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
|
JavaDoubleRDD |
flatMapToDouble(DoubleFlatMapFunction<T> f)
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
|
<K2,V2> JavaPairRDD<K2,V2> |
flatMapToPair(PairFlatMapFunction<T,K2,V2> f)
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
|
T |
fold(T zeroValue,
Function2<T,T,T> f)
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".
|
void |
foreach(VoidFunction<T> f)
Applies a function f to all elements of this RDD.
|
JavaFutureAction<Void> |
foreachAsync(VoidFunction<T> f)
The asynchronous version of the
foreach action, which
applies a function f to all the elements of this RDD. |
void |
foreachPartition(VoidFunction<java.util.Iterator<T>> f)
Applies a function f to each partition of this RDD.
|
JavaFutureAction<Void> |
foreachPartitionAsync(VoidFunction<java.util.Iterator<T>> f)
The asynchronous version of the
foreachPartition action, which
applies a function f to each partition of this RDD. |
com.google.common.base.Optional<String> |
getCheckpointFile()
Gets the name of the file to which this RDD was checkpointed
|
StorageLevel |
getStorageLevel()
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
|
JavaRDD<java.util.List<T>> |
glom()
Return an RDD created by coalescing all elements within each partition into an array.
|
<U> JavaPairRDD<U,Iterable<T>> |
groupBy(Function<T,U> f)
Return an RDD of grouped elements.
|
<U> JavaPairRDD<U,Iterable<T>> |
groupBy(Function<T,U> f,
int numPartitions)
Return an RDD of grouped elements.
|
int |
id()
A unique ID for this RDD (within its SparkContext).
|
boolean |
isCheckpointed()
Return whether this RDD has been checkpointed or not
|
java.util.Iterator<T> |
iterator(Partition split,
TaskContext taskContext)
Internal method to this RDD; will read from cache if applicable, or otherwise compute it.
|
<U> JavaPairRDD<U,T> |
keyBy(Function<T,U> f)
Creates tuples of the elements in this RDD by applying
f . |
<R> JavaRDD<R> |
map(Function<T,R> f)
Return a new RDD by applying a function to all elements of this RDD.
|
<U> JavaRDD<U> |
mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f)
Return a new RDD by applying a function to each partition of this RDD.
|
<U> JavaRDD<U> |
mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD.
|
JavaDoubleRDD |
mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f)
Return a new RDD by applying a function to each partition of this RDD.
|
JavaDoubleRDD |
mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD.
|
<K2,V2> JavaPairRDD<K2,V2> |
mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f)
Return a new RDD by applying a function to each partition of this RDD.
|
<K2,V2> JavaPairRDD<K2,V2> |
mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD.
|
<R> JavaRDD<R> |
mapPartitionsWithIndex(Function2<Integer,java.util.Iterator<T>,java.util.Iterator<R>> f,
boolean preservesPartitioning)
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
|
<R> JavaDoubleRDD |
mapToDouble(DoubleFunction<T> f)
Return a new RDD by applying a function to all elements of this RDD.
|
<K2,V2> JavaPairRDD<K2,V2> |
mapToPair(PairFunction<T,K2,V2> f)
Return a new RDD by applying a function to all elements of this RDD.
|
T |
max(java.util.Comparator<T> comp)
Returns the maximum element from this RDD as defined by the specified
Comparator[T].
|
T |
min(java.util.Comparator<T> comp)
Returns the minimum element from this RDD as defined by the specified
Comparator[T].
|
String |
name() |
java.util.List<Partition> |
partitions()
Set of partitions in this RDD.
|
JavaRDD<String> |
pipe(java.util.List<String> command)
Return an RDD created by piping elements to a forked external process.
|
JavaRDD<String> |
pipe(java.util.List<String> command,
java.util.Map<String,String> env)
Return an RDD created by piping elements to a forked external process.
|
JavaRDD<String> |
pipe(String command)
Return an RDD created by piping elements to a forked external process.
|
RDD<T> |
rdd() |
T |
reduce(Function2<T,T,T> f)
Reduces the elements of this RDD using the specified commutative and associative binary
operator.
|
void |
saveAsObjectFile(String path)
Save this RDD as a SequenceFile of serialized objects.
|
void |
saveAsTextFile(String path)
Save this RDD as a text file, using string representations of elements.
|
void |
saveAsTextFile(String path,
Class<? extends org.apache.hadoop.io.compress.CompressionCodec> codec)
Save this RDD as a compressed text file, using string representations of elements.
|
java.util.List<Partition> |
splits() |
java.util.List<T> |
take(int num)
Take the first num elements of the RDD.
|
JavaFutureAction<java.util.List<T>> |
takeAsync(int num)
The asynchronous version of the
take action, which returns a
future for retrieving the first num elements of this RDD. |
java.util.List<T> |
takeOrdered(int num)
Returns the first k (smallest) elements from this RDD using the
natural ordering for T while maintain the order.
|
java.util.List<T> |
takeOrdered(int num,
java.util.Comparator<T> comp)
Returns the first k (smallest) elements from this RDD as defined by
the specified Comparator[T] and maintains the order.
|
java.util.List<T> |
takeSample(boolean withReplacement,
int num) |
java.util.List<T> |
takeSample(boolean withReplacement,
int num,
long seed) |
java.util.List<T> |
toArray()
Deprecated.
As of Spark 1.0.0, toArray() is deprecated, use
collect() instead |
String |
toDebugString()
A description of this RDD and its recursive dependencies for debugging.
|
java.util.Iterator<T> |
toLocalIterator()
Return an iterator that contains all of the elements in this RDD.
|
java.util.List<T> |
top(int num)
Returns the top k (largest) elements from this RDD using the
natural ordering for T.
|
java.util.List<T> |
top(int num,
java.util.Comparator<T> comp)
Returns the top k (largest) elements from this RDD as defined by
the specified Comparator[T].
|
This |
wrapRDD(RDD<T> rdd) |
<U> JavaPairRDD<T,U> |
zip(JavaRDDLike<U,?> other)
Zips this RDD with another one, returning key-value pairs with the first element in each RDD,
second element in each RDD, etc.
|
<U,V> JavaRDD<V> |
zipPartitions(JavaRDDLike<U,?> other,
FlatMapFunction2<java.util.Iterator<T>,java.util.Iterator<U>,V> f)
Zip this RDD's partitions with one (or more) RDD(s) and return a new RDD by
applying a function to the zipped partitions.
|
JavaPairRDD<T,Long> |
zipWithIndex()
Zips this RDD with its element indices.
|
JavaPairRDD<T,Long> |
zipWithUniqueId()
Zips this RDD with generated unique Long ids.
|
scala.reflect.ClassTag<T> classTag()
java.util.List<Partition> splits()
java.util.List<Partition> partitions()
SparkContext context()
SparkContext
that this RDD was created on.int id()
StorageLevel getStorageLevel()
java.util.Iterator<T> iterator(Partition split, TaskContext taskContext)
<R> JavaRDD<R> map(Function<T,R> f)
<R> JavaRDD<R> mapPartitionsWithIndex(Function2<Integer,java.util.Iterator<T>,java.util.Iterator<R>> f, boolean preservesPartitioning)
<R> JavaDoubleRDD mapToDouble(DoubleFunction<T> f)
<K2,V2> JavaPairRDD<K2,V2> mapToPair(PairFunction<T,K2,V2> f)
<U> JavaRDD<U> flatMap(FlatMapFunction<T,U> f)
JavaDoubleRDD flatMapToDouble(DoubleFlatMapFunction<T> f)
<K2,V2> JavaPairRDD<K2,V2> flatMapToPair(PairFlatMapFunction<T,K2,V2> f)
<U> JavaRDD<U> mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f)
<U> JavaRDD<U> mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f, boolean preservesPartitioning)
JavaDoubleRDD mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f)
<K2,V2> JavaPairRDD<K2,V2> mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f)
JavaDoubleRDD mapPartitionsToDouble(DoubleFlatMapFunction<java.util.Iterator<T>> f, boolean preservesPartitioning)
<K2,V2> JavaPairRDD<K2,V2> mapPartitionsToPair(PairFlatMapFunction<java.util.Iterator<T>,K2,V2> f, boolean preservesPartitioning)
void foreachPartition(VoidFunction<java.util.Iterator<T>> f)
JavaRDD<java.util.List<T>> glom()
<U> JavaPairRDD<T,U> cartesian(JavaRDDLike<U,?> other)
this
and b is in other
.<U> JavaPairRDD<U,Iterable<T>> groupBy(Function<T,U> f)
<U> JavaPairRDD<U,Iterable<T>> groupBy(Function<T,U> f, int numPartitions)
JavaRDD<String> pipe(String command)
JavaRDD<String> pipe(java.util.List<String> command)
JavaRDD<String> pipe(java.util.List<String> command, java.util.Map<String,String> env)
<U> JavaPairRDD<T,U> zip(JavaRDDLike<U,?> other)
<U,V> JavaRDD<V> zipPartitions(JavaRDDLike<U,?> other, FlatMapFunction2<java.util.Iterator<T>,java.util.Iterator<U>,V> f)
JavaPairRDD<T,Long> zipWithUniqueId()
RDD.zipWithIndex()
.JavaPairRDD<T,Long> zipWithIndex()
void foreach(VoidFunction<T> f)
java.util.List<T> collect()
java.util.Iterator<T> toLocalIterator()
The iterator will consume as much memory as the largest partition in this RDD.
java.util.List<T> toArray()
collect()
insteadjava.util.List<T>[] collectPartitions(int[] partitionIds)
T reduce(Function2<T,T,T> f)
T fold(T zeroValue, Function2<T,T,T> f)
<U> U aggregate(U zeroValue, Function2<U,T,U> seqOp, Function2<U,U,U> combOp)
long count()
PartialResult<BoundedDouble> countApprox(long timeout, double confidence)
PartialResult<BoundedDouble> countApprox(long timeout)
java.util.Map<T,Long> countByValue()
PartialResult<java.util.Map<T,BoundedDouble>> countByValueApprox(long timeout, double confidence)
PartialResult<java.util.Map<T,BoundedDouble>> countByValueApprox(long timeout)
java.util.List<T> take(int num)
java.util.List<T> takeSample(boolean withReplacement, int num)
java.util.List<T> takeSample(boolean withReplacement, int num, long seed)
T first()
void saveAsTextFile(String path)
void saveAsTextFile(String path, Class<? extends org.apache.hadoop.io.compress.CompressionCodec> codec)
void saveAsObjectFile(String path)
<U> JavaPairRDD<U,T> keyBy(Function<T,U> f)
f
.void checkpoint()
boolean isCheckpointed()
com.google.common.base.Optional<String> getCheckpointFile()
String toDebugString()
java.util.List<T> top(int num, java.util.Comparator<T> comp)
num
- k, the number of top elements to returncomp
- the comparator that defines the orderjava.util.List<T> top(int num)
num
- k, the number of top elements to returnjava.util.List<T> takeOrdered(int num, java.util.Comparator<T> comp)
num
- k, the number of elements to returncomp
- the comparator that defines the orderT max(java.util.Comparator<T> comp)
comp
- the comparator that defines orderingT min(java.util.Comparator<T> comp)
comp
- the comparator that defines orderingjava.util.List<T> takeOrdered(int num)
num
- k, the number of top elements to returnlong countApproxDistinct(double relativeSD)
The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
relativeSD
- Relative accuracy. Smaller values create counters that require more space.
It must be greater than 0.000017.String name()
JavaFutureAction<Long> countAsync()
count
, which returns a
future for counting the number of elements in this RDD.JavaFutureAction<java.util.List<T>> collectAsync()
collect
, which returns a future for
retrieving an array containing all of the elements in this RDD.JavaFutureAction<java.util.List<T>> takeAsync(int num)
take
action, which returns a
future for retrieving the first num
elements of this RDD.JavaFutureAction<Void> foreachAsync(VoidFunction<T> f)
foreach
action, which
applies a function f to all the elements of this RDD.JavaFutureAction<Void> foreachPartitionAsync(VoidFunction<java.util.Iterator<T>> f)
foreachPartition
action, which
applies a function f to each partition of this RDD.