public class JavaSchemaRDD extends Object implements JavaRDDLike<Row,JavaRDD<Row>>, SchemaRDDLike
Row
objects that is returned as the result of a Spark SQL query. In addition to
standard RDD operations, a JavaSchemaRDD can also be registered as a table in the JavaSQLContext
that was used to create. Registering a JavaSchemaRDD allows its contents to be queried in
future SQL statement.
Constructor and Description |
---|
JavaSchemaRDD(SQLContext sqlContext,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan) |
Modifier and Type | Method and Description |
---|---|
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan |
baseLogicalPlan() |
SchemaRDD |
baseSchemaRDD() |
JavaSchemaRDD |
cache()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
scala.reflect.ClassTag<Row> |
classTag() |
JavaSchemaRDD |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
java.util.List<Row> |
collect()
Return an array that contains all of the elements in this RDD.
|
long |
count()
Return the number of elements in the RDD.
|
JavaSchemaRDD |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaSchemaRDD |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaSchemaRDD |
filter(Function<Row,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other,
int numPartitions)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other,
Partitioner partitioner)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
persist()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
JavaSchemaRDD |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
RDD<Row> |
rdd() |
JavaSchemaRDD |
repartition(int numPartitions)
Return a new RDD that has exactly
numPartitions partitions. |
StructType |
schema()
Returns the schema of this JavaSchemaRDD (represented by a StructType).
|
SchemaRDD |
schemaRDD()
Returns the underlying Scala SchemaRDD.
|
JavaSchemaRDD |
setName(String name)
Assign a name to this RDD
|
SQLContext |
sqlContext() |
JavaSchemaRDD |
subtract(JavaSchemaRDD other)
Return an RDD with the elements from
this that are not in other . |
JavaSchemaRDD |
subtract(JavaSchemaRDD other,
int numPartitions)
Return an RDD with the elements from
this that are not in other . |
JavaSchemaRDD |
subtract(JavaSchemaRDD other,
Partitioner p)
Return an RDD with the elements from
this that are not in other . |
java.util.List<Row> |
take(int num)
Take the first num elements of the RDD.
|
JavaRDD<String> |
toJSON()
Returns a new RDD with each row transformed to a JSON string.
|
String |
toString() |
JavaSchemaRDD |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<Row> |
wrapRDD(RDD<Row> rdd) |
aggregate, cartesian, checkpoint, collectAsync, collectPartitions, context, countApprox, countApprox, countApproxDistinct, countAsync, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachAsync, foreachPartition, foreachPartitionAsync, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, partitions, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, splits, takeAsync, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, zip, zipPartitions, zipWithIndex, zipWithUniqueId
insertInto, insertInto, logicalPlan, printSchema, queryExecution, registerAsTable, registerTempTable, saveAsParquetFile, saveAsTable, schemaString
public JavaSchemaRDD(SQLContext sqlContext, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan)
public SQLContext sqlContext()
sqlContext
in interface SchemaRDDLike
public org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan()
baseLogicalPlan
in interface SchemaRDDLike
public SchemaRDD baseSchemaRDD()
baseSchemaRDD
in interface SchemaRDDLike
public SchemaRDD schemaRDD()
public scala.reflect.ClassTag<Row> classTag()
classTag
in interface JavaRDDLike<Row,JavaRDD<Row>>
public JavaRDD<Row> wrapRDD(RDD<Row> rdd)
wrapRDD
in interface JavaRDDLike<Row,JavaRDD<Row>>
public String toString()
toString
in interface SchemaRDDLike
toString
in class Object
public StructType schema()
public JavaSchemaRDD cache()
public JavaSchemaRDD persist()
public JavaSchemaRDD persist(StorageLevel newLevel)
public JavaSchemaRDD unpersist(boolean blocking)
blocking
- Whether to block until all blocks are deleted.public JavaSchemaRDD setName(String name)
public java.util.List<Row> collect()
JavaRDDLike
collect
in interface JavaRDDLike<Row,JavaRDD<Row>>
public long count()
JavaRDDLike
count
in interface JavaRDDLike<Row,JavaRDD<Row>>
public java.util.List<Row> take(int num)
JavaRDDLike
take
in interface JavaRDDLike<Row,JavaRDD<Row>>
public JavaRDD<String> toJSON()
public JavaSchemaRDD coalesce(int numPartitions, boolean shuffle)
numPartitions
partitions.public JavaSchemaRDD distinct()
public JavaSchemaRDD distinct(int numPartitions)
public JavaSchemaRDD filter(Function<Row,Boolean> f)
public JavaSchemaRDD intersection(JavaSchemaRDD other)
Note that this method performs a shuffle internally.
public JavaSchemaRDD intersection(JavaSchemaRDD other, Partitioner partitioner)
Note that this method performs a shuffle internally.
partitioner
- Partitioner to use for the resulting RDDpublic JavaSchemaRDD intersection(JavaSchemaRDD other, int numPartitions)
Note that this method performs a shuffle internally.
numPartitions
- How many partitions to use in the resulting RDDpublic JavaSchemaRDD repartition(int numPartitions)
numPartitions
partitions.
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce
,
which can avoid performing a shuffle.
public JavaSchemaRDD subtract(JavaSchemaRDD other)
this
that are not in other
.
Uses this
partitioner/partition size, because even if other
is huge, the resulting
RDD will be <= us.
public JavaSchemaRDD subtract(JavaSchemaRDD other, int numPartitions)
this
that are not in other
.public JavaSchemaRDD subtract(JavaSchemaRDD other, Partitioner p)
this
that are not in other
.