public class JavaSchemaRDD extends Object implements JavaRDDLike<Row,JavaRDD<Row>>
Row
objects that is returned as the result of a Spark SQL query. In addition to
standard RDD operations, a JavaSchemaRDD can also be registered as a table in the JavaSQLContext
that was used to create. Registering a JavaSchemaRDD allows its contents to be queried in
future SQL statement.
Constructor and Description |
---|
JavaSchemaRDD(SQLContext sqlContext,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan) |
Modifier and Type | Method and Description |
---|---|
SchemaRDD |
baseSchemaRDD() |
JavaSchemaRDD |
cache()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
scala.reflect.ClassTag<Row> |
classTag() |
JavaSchemaRDD |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaSchemaRDD |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaSchemaRDD |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaSchemaRDD |
filter(Function<Row,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other,
int numPartitions)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other,
Partitioner partitioner)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
persist()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
JavaSchemaRDD |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
RDD<Row> |
rdd() |
JavaSchemaRDD |
repartition(int numPartitions)
Return a new RDD that has exactly
numPartitions partitions. |
JavaSchemaRDD |
setName(String name)
Assign a name to this RDD
|
SQLContext |
sqlContext() |
JavaSchemaRDD |
subtract(JavaSchemaRDD other)
Return an RDD with the elements from
this that are not in other . |
JavaSchemaRDD |
subtract(JavaSchemaRDD other,
int numPartitions)
Return an RDD with the elements from
this that are not in other . |
JavaSchemaRDD |
subtract(JavaSchemaRDD other,
Partitioner p)
Return an RDD with the elements from
this that are not in other . |
String |
toString() |
JavaSchemaRDD |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<Row> |
wrapRDD(RDD<Row> rdd) |
aggregate, cartesian, checkpoint, collect, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachPartition, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, splits, take, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, zip, zipPartitions, zipWithIndex, zipWithUniqueId
public JavaSchemaRDD(SQLContext sqlContext, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan)
public SQLContext sqlContext()
public SchemaRDD baseSchemaRDD()
public scala.reflect.ClassTag<Row> classTag()
classTag
in interface JavaRDDLike<Row,JavaRDD<Row>>
public JavaRDD<Row> wrapRDD(RDD<Row> rdd)
wrapRDD
in interface JavaRDDLike<Row,JavaRDD<Row>>
public String toString()
toString
in class Object
public JavaSchemaRDD cache()
public JavaSchemaRDD persist()
public JavaSchemaRDD persist(StorageLevel newLevel)
public JavaSchemaRDD unpersist(boolean blocking)
blocking
- Whether to block until all blocks are deleted.public JavaSchemaRDD setName(String name)
public JavaSchemaRDD coalesce(int numPartitions, boolean shuffle)
numPartitions
partitions.public JavaSchemaRDD distinct()
public JavaSchemaRDD distinct(int numPartitions)
public JavaSchemaRDD filter(Function<Row,Boolean> f)
public JavaSchemaRDD intersection(JavaSchemaRDD other)
Note that this method performs a shuffle internally.
public JavaSchemaRDD intersection(JavaSchemaRDD other, Partitioner partitioner)
Note that this method performs a shuffle internally.
partitioner
- Partitioner to use for the resulting RDDpublic JavaSchemaRDD intersection(JavaSchemaRDD other, int numPartitions)
Note that this method performs a shuffle internally.
numPartitions
- How many partitions to use in the resulting RDDpublic JavaSchemaRDD repartition(int numPartitions)
numPartitions
partitions.
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce
,
which can avoid performing a shuffle.
public JavaSchemaRDD subtract(JavaSchemaRDD other)
this
that are not in other
.
Uses this
partitioner/partition size, because even if other
is huge, the resulting
RDD will be <= us.
public JavaSchemaRDD subtract(JavaSchemaRDD other, int numPartitions)
this
that are not in other
.public JavaSchemaRDD subtract(JavaSchemaRDD other, Partitioner p)
this
that are not in other
.