- Accumulable<R,T> - Class in org.apache.spark
-
A data type that can be accumulated, ie has an commutative and associative "add" operation,
but where the result type, R
, may be different from the element type being added, T
.
- Accumulable(R, AccumulableParam<R, T>, Option<String>) - Constructor for class org.apache.spark.Accumulable
-
- Accumulable(R, AccumulableParam<R, T>) - Constructor for class org.apache.spark.Accumulable
-
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(T, String, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, to which tasks can add values
with
+=
.
- accumulable(T, String, AccumulableParam<T, R>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, with a name for display in the
Spark UI.
- accumulableCollection(R, Function1<R, Growable<T>>, ClassTag<R>) - Method in class org.apache.spark.SparkContext
-
Create an accumulator from a "mutable collection" type.
- AccumulableInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Information about an
Accumulable
modified during a task or stage.
- AccumulableInfo(long, String, Option<String>, String) - Constructor for class org.apache.spark.scheduler.AccumulableInfo
-
- AccumulableParam<R,T> - Interface in org.apache.spark
-
Helper object defining how to accumulate values of a particular type.
- accumulables() - Method in class org.apache.spark.scheduler.StageInfo
-
Terminal values of accumulables updated during this stage.
- accumulables() - Method in class org.apache.spark.scheduler.TaskInfo
-
Intermediate updates to accumulables during this task.
- Accumulator<T> - Class in org.apache.spark
-
A simpler value of
Accumulable
where the result type being accumulated is the same
as the types of elements being merged, i.e.
- Accumulator(T, AccumulatorParam<T>, Option<String>) - Constructor for class org.apache.spark.Accumulator
-
- Accumulator(T, AccumulatorParam<T>) - Constructor for class org.apache.spark.Accumulator
-
- accumulator(int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(int, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
+=
method.
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, with a name for display
in the Spark UI.
- AccumulatorParam<T> - Interface in org.apache.spark
-
A simpler version of
AccumulableParam
where the only data type you can add
in is the same type as the accumulated value.
- active() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- activeStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- actor() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- ActorHelper - Interface in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A receiver trait to be mixed in with your Actor to gain access to
the API for pushing received data into Spark Streaming for being processed.
- actorStream(Props, String, StorageLevel, SupervisorStrategy) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel, SupervisorStrategy, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- ActorSupervisorStrategy - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A helper with set of defaults for supervisor strategy
- ActorSupervisorStrategy() - Constructor for class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
-
- actorSystem() - Method in class org.apache.spark.SparkEnv
-
- add(T) - Method in class org.apache.spark.Accumulable
-
Add more data to this accumulator / accumulable
- add(Vector) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
-
Adds a new document.
- add(Vector) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Add a new sample to this summarizer, and update the statistical summary.
- add(Vector) - Method in class org.apache.spark.util.Vector
-
- addAccumulator(R, T) - Method in interface org.apache.spark.AccumulableParam
-
Add additional data to the accumulator value.
- addAccumulator(T, T) - Method in interface org.apache.spark.AccumulatorParam
-
- addedFiles() - Method in class org.apache.spark.SparkContext
-
- addedJars() - Method in class org.apache.spark.SparkContext
-
- addFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addFile(String) - Method in class org.apache.spark.SparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addInPlace(R, R) - Method in interface org.apache.spark.AccumulableParam
-
Merge two accumulated values together.
- addInPlace(double, double) - Method in class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
- addInPlace(float, float) - Method in class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
- addInPlace(int, int) - Method in class org.apache.spark.SparkContext.IntAccumulatorParam$
-
- addInPlace(long, long) - Method in class org.apache.spark.SparkContext.LongAccumulatorParam$
-
- addInPlace(Vector) - Method in class org.apache.spark.util.Vector
-
- addInPlace(Vector, Vector) - Method in class org.apache.spark.util.Vector.VectorAccumParam$
-
- addJar(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addJar(String) - Method in class org.apache.spark.SparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addLocalConfiguration(String, int, int, int, JobConf) - Static method in class org.apache.spark.rdd.HadoopRDD
-
Add Hadoop configuration specific to a single partition and attempt.
- addOnCompleteCallback(Function0<BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Add a callback function to be executed on task completion.
- addSparkListener(SparkListener) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Register a listener to receive up-calls from events that happen during execution.
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.StreamingContext
-
- addTaskCompletionListener(TaskCompletionListener) - Method in class org.apache.spark.TaskContext
-
Add a (Java friendly) listener to be executed on task completion.
- addTaskCompletionListener(Function1<TaskContext, BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Add a listener in the form of a Scala closure to be executed on task completion.
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- Aggregate - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Groups input data by groupingExpressions
and computes the aggregateExpressions
for each
group.
- Aggregate(boolean, Seq<Expression>, Seq<NamedExpression>, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Aggregate
-
- aggregate() - Method in class org.apache.spark.sql.execution.Aggregate.ComputedAggregate
-
- aggregate(Seq<Expression>) - Method in class org.apache.spark.sql.SchemaRDD
-
Performs an aggregation over all Rows in this RDD.
- Aggregate.ComputedAggregate - Class in org.apache.spark.sql.execution
-
An aggregate that needs to be computed for each row in a group.
- Aggregate.ComputedAggregate(AggregateExpression, AggregateExpression, AttributeReference) - Constructor for class org.apache.spark.sql.execution.Aggregate.ComputedAggregate
-
- Aggregate.ComputedAggregate$ - Class in org.apache.spark.sql.execution
-
- Aggregate.ComputedAggregate$() - Constructor for class org.apache.spark.sql.execution.Aggregate.ComputedAggregate$
-
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- AggregateEvaluation - Class in org.apache.spark.sql.execution
-
- AggregateEvaluation(Seq<Attribute>, Seq<Expression>, Seq<Expression>, Expression) - Constructor for class org.apache.spark.sql.execution.AggregateEvaluation
-
- aggregateExpressions() - Method in class org.apache.spark.sql.execution.Aggregate
-
- aggregateExpressions() - Method in class org.apache.spark.sql.execution.GeneratedAggregate
-
- Aggregator<K,V,C> - Class in org.apache.spark
-
:: DeveloperApi ::
A set of functions used to aggregate data.
- Aggregator(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Constructor for class org.apache.spark.Aggregator
-
- aggregator() - Method in class org.apache.spark.ShuffleDependency
-
- Algo - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to select the algorithm for the decision tree
- Algo() - Constructor for class org.apache.spark.mllib.tree.configuration.Algo
-
- algo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- algo() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
- ALL_COMPRESSION_CODECS() - Method in interface org.apache.spark.io.CompressionCodec
-
- AlphaComponent - Annotation Type in org.apache.spark.annotation
-
A new component of Spark which may have unstable API's.
- alreadyPlanned() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- ALS - Class in org.apache.spark.mllib.recommendation
-
Alternating Least Squares matrix factorization.
- ALS() - Constructor for class org.apache.spark.mllib.recommendation.ALS
-
Constructs an ALS instance with default parameters: {numBlocks: -1, rank: 10, iterations: 10,
lambda: 0.01, implicitPrefs: false, alpha: 1.0}.
- ALS.BlockStats - Class in org.apache.spark.mllib.recommendation
-
:: DeveloperApi ::
Statistics of a block in ALS computation.
- ALS.BlockStats(String, int, long, long, long, long) - Constructor for class org.apache.spark.mllib.recommendation.ALS.BlockStats
-
- ALS.BlockStats$ - Class in org.apache.spark.mllib.recommendation
-
- ALS.BlockStats$() - Constructor for class org.apache.spark.mllib.recommendation.ALS.BlockStats$
-
- analyze(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
Analyzes the given table in the current database to generate statistics, which will be
used in query optimizations.
- analyzeBlocks(RDD<Rating>, int, int) - Static method in class org.apache.spark.mllib.recommendation.ALS
-
:: DeveloperApi ::
Given an RDD of ratings, number of user blocks, and number of product blocks, computes the
statistics of each block in ALS computation.
- analyzed() - Method in class org.apache.spark.sql.hive.test.TestHiveContext.QueryExecution
-
- AnalyzeTable - Class in org.apache.spark.sql.hive.execution
-
:: DeveloperApi ::
Analyzes the given table in the current database to generate statistics, which will be
used in query optimizations.
- AnalyzeTable(String) - Constructor for class org.apache.spark.sql.hive.execution.AnalyzeTable
-
- ANY() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- appendBias(Vector) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Returns a new vector with 1.0
(bias) appended to the input vector.
- apply(int) - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- apply(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Gets the (i, j)-th element.
- apply(int) - Method in interface org.apache.spark.mllib.linalg.Vector
-
Gets the value of the ith element.
- apply(long, String, Option<String>, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(long, String, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(String) - Static method in class org.apache.spark.storage.BlockId
-
Converts a BlockId "name" String back into a BlockId.
- apply(String, String, int, int) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(ObjectInput) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object without setting useOffHeap.
- apply(boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object.
- apply(int, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object from its integer representation.
- apply(ObjectInput) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- apply(long) - Static method in class org.apache.spark.streaming.Milliseconds
-
- apply(long) - Static method in class org.apache.spark.streaming.Minutes
-
- apply(long) - Static method in class org.apache.spark.streaming.Seconds
-
- apply(TraversableOnce<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values.
- apply(Seq<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values passed as variable-length arguments.
- apply(int) - Method in class org.apache.spark.util.Vector
-
- applySchema(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
Applies a schema to an RDD of Java Beans.
- applySchema(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
:: DeveloperApi ::
Creates a JavaSchemaRDD from an RDD containing Rows by applying a schema to this RDD.
- applySchema(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
:: DeveloperApi ::
Creates a
SchemaRDD
from an
RDD
containing
Row
s by applying a schema to this RDD.
- appName() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- appName() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- appName() - Method in class org.apache.spark.SparkContext
-
- ApproxHist() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- areaUnderPR() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the precision-recall curve.
- areaUnderROC() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the receiver operating characteristic (ROC) curve.
- ArrayType - Class in org.apache.spark.sql.api.java
-
The data type representing Lists.
- as(Symbol) - Method in class org.apache.spark.sql.SchemaRDD
-
Applies a qualifier to the attributes of this relation.
- asIterator() - Method in class org.apache.spark.serializer.DeserializationStream
-
Read the elements of this stream through an iterator.
- asRDDId() - Method in class org.apache.spark.storage.BlockId
-
- AsyncRDDActions<T> - Class in org.apache.spark.rdd
-
:: Experimental ::
A set of asynchronous RDD actions available through an implicit conversion.
- AsyncRDDActions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.AsyncRDDActions
-
- attempt() - Method in class org.apache.spark.scheduler.TaskInfo
-
- attemptId() - Method in class org.apache.spark.scheduler.StageInfo
-
- attemptId() - Method in class org.apache.spark.TaskContext
-
- attributes() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- attributes() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- awaitTermination() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTermination() - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- cache() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- CacheCommand - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- CacheCommand(String, boolean, SQLContext) - Constructor for class org.apache.spark.sql.execution.CacheCommand
-
- cacheManager() - Method in class org.apache.spark.SparkEnv
-
- cacheTable(String) - Method in class org.apache.spark.sql.SQLContext
-
Caches the specified table in-memory.
- cacheTables() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for regression
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
variance calculation
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.FlatMapFunction
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.FlatMapFunction2
-
- call(T1) - Method in interface org.apache.spark.api.java.function.Function
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.Function2
-
- call(T1, T2, T3) - Method in interface org.apache.spark.api.java.function.Function3
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.VoidFunction
-
- call(T1) - Method in interface org.apache.spark.sql.api.java.UDF1
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) - Method in interface org.apache.spark.sql.api.java.UDF10
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11) - Method in interface org.apache.spark.sql.api.java.UDF11
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12) - Method in interface org.apache.spark.sql.api.java.UDF12
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13) - Method in interface org.apache.spark.sql.api.java.UDF13
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14) - Method in interface org.apache.spark.sql.api.java.UDF14
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15) - Method in interface org.apache.spark.sql.api.java.UDF15
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16) - Method in interface org.apache.spark.sql.api.java.UDF16
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17) - Method in interface org.apache.spark.sql.api.java.UDF17
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18) - Method in interface org.apache.spark.sql.api.java.UDF18
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19) - Method in interface org.apache.spark.sql.api.java.UDF19
-
- call(T1, T2) - Method in interface org.apache.spark.sql.api.java.UDF2
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20) - Method in interface org.apache.spark.sql.api.java.UDF20
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21) - Method in interface org.apache.spark.sql.api.java.UDF21
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22) - Method in interface org.apache.spark.sql.api.java.UDF22
-
- call(T1, T2, T3) - Method in interface org.apache.spark.sql.api.java.UDF3
-
- call(T1, T2, T3, T4) - Method in interface org.apache.spark.sql.api.java.UDF4
-
- call(T1, T2, T3, T4, T5) - Method in interface org.apache.spark.sql.api.java.UDF5
-
- call(T1, T2, T3, T4, T5, T6) - Method in interface org.apache.spark.sql.api.java.UDF6
-
- call(T1, T2, T3, T4, T5, T6, T7) - Method in interface org.apache.spark.sql.api.java.UDF7
-
- call(T1, T2, T3, T4, T5, T6, T7, T8) - Method in interface org.apache.spark.sql.api.java.UDF8
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9) - Method in interface org.apache.spark.sql.api.java.UDF9
-
- cancel() - Method in class org.apache.spark.ComplexFutureAction
-
- cancel() - Method in interface org.apache.spark.FutureAction
-
Cancels the execution of this action.
- cancel() - Method in class org.apache.spark.SimpleFutureAction
-
- cancelAllJobs() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelAllJobs() - Method in class org.apache.spark.SparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelJobGroup(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel active jobs for the specified group.
- cancelJobGroup(String) - Method in class org.apache.spark.SparkContext
-
Cancel active jobs for the specified group.
- cancelled() - Method in class org.apache.spark.ComplexFutureAction
-
Returns whether the promise has been cancelled.
- canEqual(Object) - Method in class org.apache.spark.sql.api.java.Row
-
- canEqual(Object) - Method in class org.apache.spark.util.MutablePair
-
- cartesian(JavaRDDLike<U, ?>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- cartesian(RDD<U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- CartesianProduct - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- CartesianProduct(SparkPlan, SparkPlan) - Constructor for class org.apache.spark.sql.execution.CartesianProduct
-
- Categorical() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- categoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- categories() - Method in class org.apache.spark.mllib.tree.model.Split
-
- category() - Method in class org.apache.spark.mllib.recommendation.ALS.BlockStats
-
- checkpoint() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Mark this RDD for checkpointing.
- checkpoint() - Method in class org.apache.spark.rdd.HadoopRDD
-
- checkpoint() - Method in class org.apache.spark.rdd.RDD
-
Mark this RDD for checkpointing.
- checkpoint(Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Enable periodic checkpointing of RDDs of this DStream.
- checkpoint(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets the context to periodically checkpoint the DStream operations for master
fault-tolerance.
- checkpoint(Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Enable periodic checkpointing of RDDs of this DStream
- checkpoint(String) - Method in class org.apache.spark.streaming.StreamingContext
-
Set the context to periodically checkpoint the DStream operations for driver
fault-tolerance.
- checkpointData() - Method in class org.apache.spark.rdd.RDD
-
- checkpointData() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDir() - Method in class org.apache.spark.SparkContext
-
- checkpointDir() - Method in class org.apache.spark.streaming.StreamingContext
-
- checkpointDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDuration() - Method in class org.apache.spark.streaming.StreamingContext
-
- child() - Method in class org.apache.spark.sql.execution.Aggregate
-
- child() - Method in class org.apache.spark.sql.execution.BatchPythonEvaluation
-
- child() - Method in class org.apache.spark.sql.execution.DescribeCommand
-
- child() - Method in class org.apache.spark.sql.execution.Distinct
-
- child() - Method in class org.apache.spark.sql.execution.EvaluatePython
-
- child() - Method in class org.apache.spark.sql.execution.Exchange
-
- child() - Method in class org.apache.spark.sql.execution.Filter
-
- child() - Method in class org.apache.spark.sql.execution.Generate
-
- child() - Method in class org.apache.spark.sql.execution.GeneratedAggregate
-
- child() - Method in class org.apache.spark.sql.execution.Limit
-
- child() - Method in class org.apache.spark.sql.execution.OutputFaker
-
- child() - Method in class org.apache.spark.sql.execution.Project
-
- child() - Method in class org.apache.spark.sql.execution.Sample
-
- child() - Method in class org.apache.spark.sql.execution.Sort
-
- child() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- child() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- child() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- child() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- children() - Method in class org.apache.spark.sql.execution.BatchPythonEvaluation
-
- children() - Method in class org.apache.spark.sql.execution.OutputFaker
-
- children() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- children() - Method in class org.apache.spark.sql.execution.Union
-
- chiSqTest(Vector, Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Conduct Pearson's chi-squared goodness of fit test of the observed data against the
expected distribution.
- chiSqTest(Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform
distribution, with each category having an expected frequency of 1 / observed.size
.
- chiSqTest(Matrix) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Conduct Pearson's independence test on the input contingency matrix, which cannot contain
negative entries or columns or rows that sum up to 0.
- chiSqTest(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Conduct Pearson's independence test for every feature against the label across the input RDD.
- ChiSqTestResult - Class in org.apache.spark.mllib.stat.test
-
:: Experimental ::
Object containing the test results for the chi-squared hypothesis test.
- Classification() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- ClassificationModel - Interface in org.apache.spark.mllib.classification
-
:: Experimental ::
Represents a classification model that predicts to which of a set of categories an example
belongs.
- className() - Method in class org.apache.spark.ExceptionFailure
-
- classpathEntries() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- classTag() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaRDD
-
- classTag() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- classTag() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
- classTag() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- cleaner() - Method in class org.apache.spark.SparkContext
-
- clearCallSite() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- clearCallSite() - Method in class org.apache.spark.SparkContext
-
Support function for API backtraces.
- clearDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- clearFiles() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearFiles() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJobGroup() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the current thread's job group ID and its description.
- clearJobGroup() - Method in class org.apache.spark.SparkContext
-
Clear the current thread's job group ID and its description.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Clears the threshold so that predict
will output raw prediction scores.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Clears the threshold so that predict
will output raw prediction scores.
- clone() - Method in class org.apache.spark.SparkConf
-
Copy this object
- clone() - Method in class org.apache.spark.storage.StorageLevel
-
- clone() - Method in class org.apache.spark.util.random.BernoulliSampler
-
- clone() - Method in class org.apache.spark.util.random.PoissonSampler
-
- clone() - Method in interface org.apache.spark.util.random.RandomSampler
-
- cloneComplement() - Method in class org.apache.spark.util.random.BernoulliSampler
-
Return a sampler that is the complement of the range specified of the current sampler.
- close() - Method in class org.apache.spark.serializer.DeserializationStream
-
- close() - Method in class org.apache.spark.serializer.SerializationStream
-
- closureSerializer() - Method in class org.apache.spark.SparkEnv
-
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- coalesce(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean, Ordering<Row>) - Method in class org.apache.spark.sql.SchemaRDD
-
- codegenEnabled() - Method in class org.apache.spark.sql.execution.SparkPlan
-
- cogroup(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- CoGroupedRDD<K> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD that cogroups its parents.
- CoGroupedRDD(Seq<RDD<? extends Product2<K, ?>>>, Partitioner) - Constructor for class org.apache.spark.rdd.CoGroupedRDD
-
- collect() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in this RDD.
- collect() - Method in class org.apache.spark.rdd.RDD
-
Return an array that contains all of the elements in this RDD.
- collect(PartialFunction<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD that contains all matching values by applying f
.
- collect() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- collect() - Method in class org.apache.spark.sql.SchemaRDD
-
- collectAsMap() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsMap() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for retrieving all elements of this RDD.
- collectPartitions(int[]) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in a specific partition of this RDD.
- colStats(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Computes column-wise summary statistics for the input RDD[Vector].
- columnPruningPred() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the output RDD.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the resulting RDD using the existing
partitioner/parallelism level.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Simplified version of combineByKey that hash-partitions the output RDD.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Simplified version of combineByKey that hash-partitions the resulting RDD using the
existing partitioner/parallelism level.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, ClassTag<C>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Combine elements of each key in DStream's RDDs using custom functions.
- combineCombinersByKey(Iterator<Product2<K, C>>) - Method in class org.apache.spark.Aggregator
-
- combineCombinersByKey(Iterator<Product2<K, C>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- Command - Interface in org.apache.spark.sql.execution
-
- commands() - Method in class org.apache.spark.sql.hive.test.TestHiveContext.TestTable
-
- compare(RDDInfo) - Method in class org.apache.spark.storage.RDDInfo
-
- completedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- completionTime() - Method in class org.apache.spark.scheduler.StageInfo
-
Time when all tasks in the stage completed or when the stage was cancelled.
- ComplexFutureAction<T> - Class in org.apache.spark
-
:: Experimental ::
A
FutureAction
for actions that could trigger multiple Spark jobs.
- ComplexFutureAction() - Constructor for class org.apache.spark.ComplexFutureAction
-
- compressedInputStream(InputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- CompressionCodec - Interface in org.apache.spark.io
-
:: DeveloperApi ::
CompressionCodec allows the customization of choosing different compression implementations
to be used in block storage.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point.
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point,
add the gradient to a provided vector to avoid creating new objects, and return loss.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.L1Updater
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SimpleUpdater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SquaredL2Updater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.Updater
-
Compute an updated value for weights given the gradient, stepSize, iteration number and
regularization parameter.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.HadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.JdbcRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.PartitionPruningRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.ShuffledRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.UnionRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.sql.SchemaRDD
-
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Generate an RDD for the given duration
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Method that generates a RDD for the given Duration
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- compute(Time) - Method in class org.apache.spark.streaming.dstream.DStream
-
Method that generates a RDD for the given time
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Ask ReceiverInputTracker for received data blocks and generates RDDs with them.
- computeColumnSummaryStatistics() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes column-wise summary statistics.
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Return the K-means cost (sum of squared distances of points to their nearest center) for this
model on the given data.
- computeCovariance() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the covariance matrix, treating each row as an observation.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Computes the Gramian matrix A^T A
.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the Gramian matrix A^T A
.
- computePreferredLocations(Seq<InputFormatInfo>) - Static method in class org.apache.spark.scheduler.InputFormatInfo
-
Computes the preferred locations based on input(s) and returned a location to block map.
- computePrincipalComponents(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the top k principal components.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Computes the singular value decomposition of this matrix.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes singular value decomposition of this matrix.
- condition() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- condition() - Method in class org.apache.spark.sql.execution.Filter
-
- condition() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- condition() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- conditionEvaluator() - Method in class org.apache.spark.sql.execution.Filter
-
- conf() - Method in class org.apache.spark.SparkContext
-
- conf() - Method in class org.apache.spark.SparkEnv
-
- conf() - Method in class org.apache.spark.streaming.StreamingContext
-
- confidence() - Method in class org.apache.spark.partial.BoundedDouble
-
- configuration() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.HadoopRDD
-
Constructing Configuration objects is not threadsafe, use this lock to serialize.
- confusionMatrix() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns confusion matrix:
predicted classes are in columns,
they are ordered by class label ascending,
as in "labels"
- connectionManager() - Method in class org.apache.spark.SparkEnv
-
- ConstantInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
An input stream that always returns the same RDD on each timestep.
- ConstantInputDStream(StreamingContext, RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- contains(String) - Method in class org.apache.spark.SparkConf
-
Does the configuration contain a given parameter?
- containsBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
-
Return whether the given block is stored in this block manager in O(1) time.
- containsCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
- context() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- context() - Method in class org.apache.spark.InterruptibleIterator
-
- context() - Method in class org.apache.spark.rdd.RDD
-
- context() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- context() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- context() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return the StreamingContext associated with this DStream
- Continuous() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- CoordinateMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents a matrix in coordinate format.
- CoordinateMatrix(RDD<MatrixEntry>, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
- CoordinateMatrix(RDD<MatrixEntry>) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
Alternative constructor leaving matrix dimensions to be determined automatically.
- copy() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- copy() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- copy() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Makes a deep copy of this vector.
- copy() - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- copy() - Method in interface org.apache.spark.mllib.random.RandomDataGenerator
-
Returns a copy of the RandomDataGenerator with a new instance of the rng object used in the
class when applicable for non-locking concurrent usage.
- copy() - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.UniformGenerator
-
- copy() - Method in class org.apache.spark.util.StatCounter
-
Clone this StatCounter
- corr(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Compute the Pearson correlation matrix for the input RDD of Vectors.
- corr(RDD<Vector>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Compute the correlation matrix for the input RDD of Vectors using the specified method.
- corr(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Compute the Pearson correlation for the input RDDs.
- corr(RDD<Object>, RDD<Object>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
:: Experimental ::
Compute the correlation for the input RDDs using the specified method.
- count() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.mllib.recommendation.ALS.BlockStats
-
- count() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
- count() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample size.
- count() - Method in class org.apache.spark.rdd.RDD
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.sql.SchemaRDD
-
:: Experimental ::
Return the number of elements in the RDD.
- count() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.util.StatCounter
-
- countApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long, double) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApproxDistinct(double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(int, int) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(double) - Method in class org.apache.spark.rdd.RDD
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(int, int, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for counting the number of elements in the RDD.
- countByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Count the number of elements for each key, and return the result to the master as a Map.
- countByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Count the number of elements for each key, and return the result to the master as a Map.
- countByKeyApprox(long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByValue() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValueAndWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long, double, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
:: Experimental ::
Approximate version of countByValue().
- countByWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a window over this DStream.
- countByWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a sliding window over this DStream.
- create(boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Deprecated.
- create(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Create a new StorageLevel object.
- create(RDD<T>, Function1<Object, Object>) - Static method in class org.apache.spark.rdd.PartitionPruningRDD
-
Create a PartitionPruningRDD.
- create(Object...) - Static method in class org.apache.spark.sql.api.java.Row
-
Creates a Row with the given values.
- create(Seq<Object>) - Static method in class org.apache.spark.sql.api.java.Row
-
Creates a Row with the given values.
- create() - Method in interface org.apache.spark.streaming.api.java.JavaStreamingContextFactory
-
- createArrayType(DataType) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates an ArrayType by specifying the data type of elements (elementType
).
- createArrayType(DataType, boolean) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates an ArrayType by specifying the data type of elements (elementType
) and
whether the array contains null values (containsNull
).
- createCodec(SparkConf) - Method in interface org.apache.spark.io.CompressionCodec
-
- createCodec(SparkConf, String) - Method in interface org.apache.spark.io.CompressionCodec
-
- createCombiner() - Method in class org.apache.spark.Aggregator
-
- createMapType(DataType, DataType) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates a MapType by specifying the data type of keys (keyType
) and values
(keyType
).
- createMapType(DataType, DataType, boolean) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates a MapType by specifying the data type of keys (keyType
), the data type of
values (keyType
), and whether values contain any null value
(valueContainsNull
).
- createParquetFile(Class<?>, String, boolean, Configuration) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
:: Experimental ::
Creates an empty parquet file with the schema of class beanClass
, which can be registered as
a table.
- createParquetFile(String, boolean, Configuration, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
:: Experimental ::
Creates an empty parquet file with the schema of class A
, which can be registered as a table.
- createPollingStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createSchemaRDD(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
Creates a SchemaRDD from an RDD of case classes.
- createStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(StreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(StreamingContext, String, String, Map<String, Object>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from a Kafka Broker.
- createStream(StreamingContext, Map<String, String>, Map<String, Object>, StorageLevel, ClassTag<K>, ClassTag<V>, ClassTag<U>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from a Kafka Broker.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages form a Kafka Broker.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages form a Kafka Broker.
- createStream(JavaStreamingContext, Class<K>, Class<V>, Class<U>, Class<T>, Map<String, String>, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages form a Kafka Broker.
- createStream(StreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an InputDStream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create a Java-friendly InputDStream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(StreamingContext, Option<Authorization>, Seq<String>, StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, Authorization) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(StreamingContext, String, Subscribe, Function1<Seq<ByteString>, Iterator<T>>, StorageLevel, SupervisorStrategy, ClassTag<T>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel, SupervisorStrategy) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStructField(String, DataType, boolean) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates a StructField by specifying the name (name
), data type (dataType
) and
whether values of this field can be null values (nullable
).
- createStructType(List<StructField>) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates a StructType with the given list of StructFields (fields
).
- createStructType(StructField[]) - Static method in class org.apache.spark.sql.api.java.DataType
-
Creates a StructType with the given StructField array (fields
).
- createTable(String, boolean, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.hive.HiveContext
-
Creates a table using the schema of the given class.
- creationSite() - Method in class org.apache.spark.rdd.RDD
-
User code that created this RDD (e.g.
- failed() - Method in class org.apache.spark.scheduler.TaskInfo
-
- failedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- failureReason() - Method in class org.apache.spark.scheduler.StageInfo
-
If the stage failed, the reason why.
- FAIR() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- FakeParquetSerDe - Class in org.apache.spark.sql.hive.parquet
-
A placeholder that allows SparkSQL users to create metastore tables that are stored as
parquet files.
- FakeParquetSerDe() - Constructor for class org.apache.spark.sql.hive.parquet.FakeParquetSerDe
-
- falsePositiveRate(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns false positive rate for a given label (category)
- feature() - Method in class org.apache.spark.mllib.tree.model.Split
-
- features() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- FeatureType - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to describe whether a feature is "continuous" or "categorical"
- FeatureType() - Constructor for class org.apache.spark.mllib.tree.configuration.FeatureType
-
- featureType() - Method in class org.apache.spark.mllib.tree.model.Split
-
- FetchFailed - Class in org.apache.spark
-
:: DeveloperApi ::
Task failed to fetch shuffle data from a remote node.
- FetchFailed(BlockManagerId, int, int, int) - Constructor for class org.apache.spark.FetchFailed
-
- field() - Method in class org.apache.spark.storage.BroadcastBlockId
-
- FIFO() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- files() - Method in class org.apache.spark.SparkContext
-
- fileStream(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Function1<Path, Object>, boolean, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- filter(Function<Double, Boolean>) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<T, Boolean>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function1<T, Object>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<Row, Boolean>) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- Filter - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Filter(Expression, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Filter
-
- filter(Function1<Row, Object>) - Method in class org.apache.spark.sql.SchemaRDD
-
- filter(Function<T, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function1<T, Object>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filterWith(Function1<Object, A>, Function2<T, A, Object>) - Method in class org.apache.spark.rdd.RDD
-
Filters this RDD with p, where p takes an additional parameter of type A.
- findSynonyms(String, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
Find synonyms of a word
- findSynonyms(Vector, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
Find synonyms of the vector representation of a word
- finished() - Method in class org.apache.spark.scheduler.TaskInfo
-
- finishTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task has completed successfully (including the time to remotely fetch
results, if necessary).
- first() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- first() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- first() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the first element in this RDD.
- first() - Method in class org.apache.spark.rdd.RDD
-
Return the first element in this RDD.
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
-
Computes the inverse document frequency.
- fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
-
Computes the inverse document frequency.
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.StandardScaler
-
Computes the mean and variance and stores as a model to be used for later scaling.
- fit(RDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Computes the vector representation of each word in vocabulary.
- fit(JavaRDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Computes the vector representation of each word in vocabulary (Java version).
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(Function1<T, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMap(Function1<T, Traversable<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- FlatMapFunction<T,R> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more output records from each input record.
- FlatMapFunction2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A function that takes two inputs and returns zero or more output records.
- flatMapToDouble(DoubleFlatMapFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function1<V, TraversableOnce<U>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapValues(Function1<V, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapWith(Function1<Object, A>, boolean, Function2<T, A, Seq<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
FlatMaps f over this RDD, where f takes an additional parameter of type A.
- floatToFloatWritable(float) - Static method in class org.apache.spark.SparkContext
-
- FloatType - Static variable in class org.apache.spark.sql.api.java.DataType
-
Gets the FloatType object.
- FloatType - Class in org.apache.spark.sql.api.java
-
The data type representing float and Float values.
- floatWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- floor(Duration) - Method in class org.apache.spark.streaming.Time
-
- FlumeUtils - Class in org.apache.spark.streaming.flume
-
- FlumeUtils() - Constructor for class org.apache.spark.streaming.flume.FlumeUtils
-
- flush() - Method in class org.apache.spark.serializer.SerializationStream
-
- fMeasure(double, double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f-measure for a given label (category)
- fMeasure(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f1-measure for a given label (category)
- fMeasure() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f-measure
(equals to precision and recall because precision equals recall)
- fMeasureByThreshold(double) - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve.
- fMeasureByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve with beta = 1.0.
- fold(T, Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".
- fold(T, Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value"
which may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foreach(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to all elements of this RDD.
- foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to all elements of this RDD.
- foreach(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreach(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachAsync(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to all elements of this RDD.
- foreachPartition(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to each partition of this RDD.
- foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to each partition of this RDD.
- foreachPartitionAsync(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to each partition of this RDD.
- foreachRDD(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachWith(Function1<Object, A>, Function2<T, A, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies f to each element of this RDD, where f takes an additional parameter of type A.
- formatExecutorId(String) - Method in class org.apache.spark.storage.StorageStatusListener
-
In the local mode, there is a discrepancy between the executor ID according to the
task ("localhost") and that according to SparkEnv ("").
- fraction() - Method in class org.apache.spark.sql.execution.Sample
-
- fromAvroFlumeEvent(AvroFlumeEvent) - Static method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- fromDStream(DStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaDStream
-
- fromInputDStream(InputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- fromInputDStream(InputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairInputDStream
-
- fromJavaDStream(JavaDStream<Tuple2<K, V>>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromJavaRDD(JavaRDD<Tuple2<K, V>>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
Convert a JavaRDD of key-value pairs to JavaPairRDD.
- fromPairDStream(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromProductRdd(RDD<A>, TypeTags.TypeTag<A>) - Static method in class org.apache.spark.sql.execution.ExistingRdd
-
- fromRDD(RDD<Object>) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
-
- fromRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
- fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.api.java.JavaRDD
-
- fromRdd(RDD<?>) - Static method in class org.apache.spark.storage.RDDInfo
-
- fromReceiverInputDStream(ReceiverInputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- fromReceiverInputDStream(ReceiverInputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- fromSparkContext(SparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
-
- fromStage(Stage, Option<Object>) - Static method in class org.apache.spark.scheduler.StageInfo
-
Construct a StageInfo from a Stage.
- fromString(String) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Return the StorageLevel object with the specified name.
- Function<T1,R> - Interface in org.apache.spark.api.java.function
-
Base interface for functions whose return types do not create special RDDs.
- Function2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A two-argument function that takes arguments of type T1 and T2 and returns an R.
- Function3<T1,T2,T3,R> - Interface in org.apache.spark.api.java.function
-
A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.
- FutureAction<T> - Interface in org.apache.spark
-
:: Experimental ::
A future for the result of an action to support cancellation.
- gain() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel> - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).
- GeneralizedLinearAlgorithm() - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
- GeneralizedLinearModel - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearModel (GLM) represents a model trained using
GeneralizedLinearAlgorithm.
- GeneralizedLinearModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
- Generate - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Applies a Generator
to a stream of input rows, combining the
output of each into a new stream of rows.
- Generate(Generator, boolean, boolean, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Generate
-
- generate(Generator, boolean, boolean, Option<String>) - Method in class org.apache.spark.sql.SchemaRDD
-
:: Experimental ::
Applies the given Generator, or table generating function, to this relation.
- GeneratedAggregate - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Alternate version of aggregation that leverages projection and thus code generation.
- GeneratedAggregate(boolean, Seq<Expression>, Seq<NamedExpression>, SparkPlan) - Constructor for class org.apache.spark.sql.execution.GeneratedAggregate
-
- generatedRDDs() - Method in class org.apache.spark.streaming.dstream.DStream
-
- generateKMeansRDD(SparkContext, int, int, int, double, int) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
Generate an RDD containing test data for KMeans.
- generateLinearInput(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- generateLinearInputAsList(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Return a Java List of synthetic data randomly generated according to a multi
collinear model.
- generateLinearRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso,
and uregularized variants.
- generateLogisticRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
Generate an RDD containing test data for LogisticRegression.
- generator() - Method in class org.apache.spark.sql.execution.Generate
-
- get() - Method in interface org.apache.spark.FutureAction
-
Blocks and returns the result of this job.
- get(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter; throws a NoSuchElementException if it's not set
- get(String, String) - Method in class org.apache.spark.SparkConf
-
Get a parameter, falling back to a default if not set
- get() - Static method in class org.apache.spark.SparkEnv
-
Returns the ThreadLocal SparkEnv, if non-null.
- get(String) - Static method in class org.apache.spark.SparkFiles
-
Get the absolute path of a file added through SparkContext.addFile()
.
- get(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column `i`.
- getAkkaConf() - Method in class org.apache.spark.SparkConf
-
Get all akka conf variables set on this SparkConf
- getAll() - Method in class org.apache.spark.SparkConf
-
Get all parameters as a list of pairs
- getAllPools() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return pools for fair scheduler
- getBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
-
Return the given block stored in this block manager in O(1) time.
- getBoolean(String, boolean) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a boolean, falling back to a default if not set
- getBoolean(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a bool.
- getByte(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a byte.
- getCachedBlockManagerId(BlockManagerId) - Static method in class org.apache.spark.storage.BlockManagerId
-
- getCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
The three methods below are helpers for accessing the local map, a property of the SparkEnv of
the local process.
- getCheckpointDir() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- getCheckpointDir() - Method in class org.apache.spark.SparkContext
-
- getCheckpointFile() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Gets the name of the file to which this RDD was checkpointed
- getCheckpointFile() - Method in class org.apache.spark.rdd.RDD
-
Gets the name of the file to which this RDD was checkpointed
- getConf() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Return a copy of this JavaSparkContext's configuration.
- getConf() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getConf() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getConf() - Method in class org.apache.spark.SparkContext
-
Return a copy of this SparkContext's configuration.
- getDataType() - Method in class org.apache.spark.sql.api.java.StructField
-
- getDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- getDouble(String, double) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a double, falling back to a default if not set
- getDouble(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a double.
- getElementType() - Method in class org.apache.spark.sql.api.java.ArrayType
-
- getExecutorEnv() - Method in class org.apache.spark.SparkConf
-
Get all executor environment variables set on this SparkConf
- getExecutorMemoryStatus() - Method in class org.apache.spark.SparkContext
-
Return a map from the slave to the max memory available for caching and the remaining
memory available for caching.
- getExecutorStorageStatus() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about blocks stored in all of the slaves
- getFields() - Method in class org.apache.spark.sql.api.java.StructType
-
- getFinalValue() - Method in class org.apache.spark.partial.PartialResult
-
Blocking method to wait for and return the final value.
- getFloat(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a float.
- getHiveFile(String) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- getInt(String, int) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an integer, falling back to a default if not set
- getInt(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as an int.
- getKeyType() - Method in class org.apache.spark.sql.api.java.MapType
-
- getLocalProperty(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLocalProperty(String) - Method in class org.apache.spark.SparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLong(String, long) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a long, falling back to a default if not set
- getLong(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a long.
- getName() - Method in class org.apache.spark.sql.api.java.StructField
-
- getObjectInspector() - Method in class org.apache.spark.sql.hive.parquet.FakeParquetSerDe
-
- getOption(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an Option
- getOrCreate(String, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Configuration, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Configuration, JavaStreamingContextFactory, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getParents(int) - Method in class org.apache.spark.NarrowDependency
-
Get the parent partitions for a child partition.
- getParents(int) - Method in class org.apache.spark.OneToOneDependency
-
- getParents(int) - Method in class org.apache.spark.RangeDependency
-
- getPartition(Object) - Method in class org.apache.spark.HashPartitioner
-
- getPartition(Object) - Method in class org.apache.spark.Partitioner
-
- getPartition(Object) - Method in class org.apache.spark.RangePartitioner
-
- getPartitions() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.JdbcRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.UnionRDD
-
- getPartitions() - Method in class org.apache.spark.sql.SchemaRDD
-
- getPersistentRDDs() - Method in class org.apache.spark.SparkContext
-
Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
- getPoolForName(String) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return the pool associated with the given name, if one exists
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.UnionRDD
-
- getRDDStorageInfo() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about what RDDs are cached, if they are in mem or on disk, how much space
they take, etc.
- getReceiver() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Gets the receiver object that will be sent to the worker nodes
to receive data.
- getRootDirectory() - Static method in class org.apache.spark.SparkFiles
-
Get the root directory that contains files added through SparkContext.addFile()
.
- getSchedulingMode() - Method in class org.apache.spark.SparkContext
-
Return current scheduling mode
- getSerDeStats() - Method in class org.apache.spark.sql.hive.parquet.FakeParquetSerDe
-
- getSerializedClass() - Method in class org.apache.spark.sql.hive.parquet.FakeParquetSerDe
-
- getSerializer(Serializer) - Static method in class org.apache.spark.serializer.Serializer
-
- getSerializer(Option<Serializer>) - Static method in class org.apache.spark.serializer.Serializer
-
- getShort(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a short.
- getSparkHome() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference).
- getStorageLevel() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getStorageLevel() - Method in class org.apache.spark.rdd.RDD
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getString(int) - Method in class org.apache.spark.sql.api.java.Row
-
Returns the value of column i
as a String.
- getThreadLocal() - Static method in class org.apache.spark.SparkEnv
-
Returns the ThreadLocal SparkEnv.
- gettingResult() - Method in class org.apache.spark.scheduler.TaskInfo
-
- gettingResultTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task started remotely getting the result.
- getValueType() - Method in class org.apache.spark.sql.api.java.MapType
-
- Gini - Class in org.apache.spark.mllib.tree.impurity
-
:: Experimental ::
Class for calculating the
Gini impurity
during binary classification.
- Gini() - Constructor for class org.apache.spark.mllib.tree.impurity.Gini
-
- global() - Method in class org.apache.spark.sql.execution.Sort
-
- glom() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- glom() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- Gradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to compute the gradient for a loss function, given a single data point.
- Gradient() - Constructor for class org.apache.spark.mllib.optimization.Gradient
-
- GradientDescent - Class in org.apache.spark.mllib.optimization
-
Class used to solve an optimization problem using Gradient Descent.
- graph() - Method in class org.apache.spark.streaming.dstream.DStream
-
- graph() - Method in class org.apache.spark.streaming.StreamingContext
-
- groupBy(Function<T, K>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD of grouped elements.
- groupBy(Function<T, K>, int) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD of grouped elements.
- groupBy(Function1<T, K>, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped items.
- groupBy(Function1<T, K>, int, ClassTag<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped elements.
- groupBy(Function1<T, K>, Partitioner, ClassTag<K>, Ordering<K>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD of grouped items.
- groupBy(Seq<Expression>, Seq<Expression>) - Method in class org.apache.spark.sql.SchemaRDD
-
Performs a grouping followed by an aggregation.
- groupByKey(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey(int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Group the values for each key in the RDD into a single sequence.
- groupByKey() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
on each RDD of this
DStream.
- groupByKey() - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
to each RDD.
- groupByKey(Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
on each RDD.
- groupByKeyAndWindow(Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window.
- groupByKeyAndWindow(Duration, Duration, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupByKeyAndWindow(Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Create a new DStream by applying groupByKey
over a sliding window on this
DStream.
- groupingExpressions() - Method in class org.apache.spark.sql.execution.Aggregate
-
- groupingExpressions() - Method in class org.apache.spark.sql.execution.GeneratedAggregate
-
- groupWith(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- groupWith(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for cogroup.
- L1Updater - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Updater for L1 regularized problems.
- L1Updater() - Constructor for class org.apache.spark.mllib.optimization.L1Updater
-
- label() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- LabeledPoint - Class in org.apache.spark.mllib.regression
-
Class that represents the features and labels of a data point.
- LabeledPoint(double, Vector) - Constructor for class org.apache.spark.mllib.regression.LabeledPoint
-
- labels() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- labels() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns the sequence of labels in ascending order
- LassoModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using Lasso.
- LassoModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.LassoModel
-
- LassoWithSGD - Class in org.apache.spark.mllib.regression
-
Train a regression model with L1-regularization using Stochastic Gradient Descent.
- LassoWithSGD() - Constructor for class org.apache.spark.mllib.regression.LassoWithSGD
-
Construct a Lasso object with default parameters: {stepSize: 1.0, numIterations: 100,
regParam: 1.0, miniBatchFraction: 1.0}.
- lastError() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- lastErrorMessage() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- lastValidTime() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- latestModel() - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Return the latest model.
- launchTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
- LBFGS - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to solve an optimization problem using Limited-memory BFGS.
- LBFGS(Gradient, Updater) - Constructor for class org.apache.spark.mllib.optimization.LBFGS
-
- LeastSquaresGradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Compute gradient and loss for a Least-squared loss function, as used in linear regression.
- LeastSquaresGradient() - Constructor for class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- left() - Method in class org.apache.spark.sql.execution.BroadcastHashJoin
-
- left() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- left() - Method in class org.apache.spark.sql.execution.CartesianProduct
-
- left() - Method in class org.apache.spark.sql.execution.Except
-
- left() - Method in interface org.apache.spark.sql.execution.HashJoin
-
- left() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- left() - Method in class org.apache.spark.sql.execution.Intersect
-
- left() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
The Streamed Relation
- left() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- left() - Method in class org.apache.spark.sql.execution.ShuffledHashJoin
-
- leftImpurity() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- leftKeys() - Method in class org.apache.spark.sql.execution.BroadcastHashJoin
-
- leftKeys() - Method in interface org.apache.spark.sql.execution.HashJoin
-
- leftKeys() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- leftKeys() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- leftKeys() - Method in class org.apache.spark.sql.execution.ShuffledHashJoin
-
- leftNode() - Method in class org.apache.spark.mllib.tree.model.Node
-
- leftOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a left outer join of this
and other
.
- leftOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a left outer join of this
and other
.
- leftOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a left outer join of this
and other
.
- leftOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a left outer join of this
and other
.
- leftOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a left outer join of this
and other
.
- leftOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a left outer join of this
and other
.
- leftOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'join' between RDDs of this
DStream and other
DStream.
- leftOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- leftOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'left outer join' between RDDs of this
DStream and
other
DStream.
- LeftSemiJoinBNL - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Using BroadcastNestedLoopJoin to calculate left semi join result when there's no join keys
for hash join.
- LeftSemiJoinBNL(SparkPlan, SparkPlan, Option<Expression>) - Constructor for class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- LeftSemiJoinHash - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Build the right table's join keys into a HashSet, and iteratively go through the left
table, to find the if join keys are in the Hash set.
- LeftSemiJoinHash(Seq<Expression>, Seq<Expression>, SparkPlan, SparkPlan) - Constructor for class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- length() - Method in class org.apache.spark.scheduler.SplitInfo
-
- length() - Method in class org.apache.spark.sql.api.java.Row
-
Returns the number of columns present in this Row.
- length() - Method in class org.apache.spark.util.Vector
-
- Limit - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
Take the first limit elements.
- Limit(int, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Limit
-
- limit() - Method in class org.apache.spark.sql.execution.Limit
-
- limit() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- limit(Expression) - Method in class org.apache.spark.sql.SchemaRDD
-
- limit(int) - Method in class org.apache.spark.sql.SchemaRDD
-
Limits the results by the given integer.
- LinearDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate sample data used for Linear Data.
- LinearDataGenerator() - Constructor for class org.apache.spark.mllib.util.LinearDataGenerator
-
- LinearRegressionModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using LinearRegression.
- LinearRegressionModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.LinearRegressionModel
-
- LinearRegressionWithSGD - Class in org.apache.spark.mllib.regression
-
Train a linear regression model with no regularization using Stochastic Gradient Descent.
- LinearRegressionWithSGD() - Constructor for class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
Construct a LinearRegression object with default parameters: {stepSize: 1.0,
numIterations: 100, miniBatchFraction: 1.0}.
- listenerBus() - Method in class org.apache.spark.SparkContext
-
- loadLabeledData(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLabeledPoints(SparkContext, String, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled points saved using RDD[LabeledPoint].saveAsTextFile
.
- loadLabeledPoints(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled points saved using RDD[LabeledPoint].saveAsTextFile
with the default number of
partitions.
- loadLibSVMFile(SparkContext, String, int, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint].
- loadLibSVMFile(SparkContext, String, boolean, int, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLibSVMFile(SparkContext, String, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads labeled data in the LIBSVM format into an RDD[LabeledPoint], with the default number of
partitions.
- loadLibSVMFile(SparkContext, String, boolean, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLibSVMFile(SparkContext, String, boolean) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- loadLibSVMFile(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads binary labeled data in the LIBSVM format into an RDD[LabeledPoint], with number of
features determined automatically and the default number of partitions.
- loadTestTable(String) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- loadVectors(SparkContext, String, int) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads vectors saved using RDD[Vector].saveAsTextFile
.
- loadVectors(SparkContext, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Loads vectors saved using RDD[Vector].saveAsTextFile
with the default number of partitions.
- LocalHiveContext - Class in org.apache.spark.sql.hive
-
DEPRECATED: Use HiveContext instead.
- LocalHiveContext(SparkContext) - Constructor for class org.apache.spark.sql.hive.LocalHiveContext
-
- localValue() - Method in class org.apache.spark.Accumulable
-
Get the current value of this accumulator from within a task.
- location() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- log() - Method in interface org.apache.spark.Logging
-
- log_() - Method in interface org.apache.spark.Logging
-
- logDebug(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logDebug(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- logDirName() - Method in class org.apache.spark.scheduler.JobLogger
-
- logError(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logError(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- Logging - Interface in org.apache.spark
-
:: DeveloperApi ::
Utility trait for classes that want to log data.
- logicalPlan() - Method in class org.apache.spark.sql.execution.ExplainCommand
-
- logicalPlanToSparkQuery(LogicalPlan) - Method in class org.apache.spark.sql.SQLContext
-
:: DeveloperApi ::
Allows catalyst LogicalPlans to be executed as a SchemaRDD.
- logInfo(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logInfo(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- LogisticGradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Compute gradient and loss for a logistic loss function, as used in binary classification.
- LogisticGradient() - Constructor for class org.apache.spark.mllib.optimization.LogisticGradient
-
- LogisticRegressionDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate test data for LogisticRegression.
- LogisticRegressionDataGenerator() - Constructor for class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
- LogisticRegressionModel - Class in org.apache.spark.mllib.classification
-
Classification model trained using Logistic Regression.
- LogisticRegressionModel(Vector, double) - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- LogisticRegressionWithLBFGS - Class in org.apache.spark.mllib.classification
-
Train a classification model for Logistic Regression using Limited-memory BFGS.
- LogisticRegressionWithLBFGS() - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-
- LogisticRegressionWithSGD - Class in org.apache.spark.mllib.classification
-
Train a classification model for Logistic Regression using Stochastic Gradient Descent.
- LogisticRegressionWithSGD() - Constructor for class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
Construct a LogisticRegression object with default parameters
- logName() - Method in interface org.apache.spark.Logging
-
- logTrace(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logTrace(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- logWarning(Function0<String>) - Method in interface org.apache.spark.Logging
-
- logWarning(Function0<String>, Throwable) - Method in interface org.apache.spark.Logging
-
- longToLongWritable(long) - Static method in class org.apache.spark.SparkContext
-
- LongType - Static variable in class org.apache.spark.sql.api.java.DataType
-
Gets the LongType object.
- LongType - Class in org.apache.spark.sql.api.java
-
The data type representing long and Long values.
- longWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- lookup(K) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the list of values in the RDD for key key
.
- lookup(K) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return the list of values in the RDD for key key
.
- low() - Method in class org.apache.spark.partial.BoundedDouble
-
- LZ4CompressionCodec - Class in org.apache.spark.io
-
- LZ4CompressionCodec(SparkConf) - Constructor for class org.apache.spark.io.LZ4CompressionCodec
-
- LZFCompressionCodec - Class in org.apache.spark.io
-
- LZFCompressionCodec(SparkConf) - Constructor for class org.apache.spark.io.LZFCompressionCodec
-
- main(String[]) - Static method in class org.apache.spark.examples.streaming.JavaKinesisWordCountASL
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.MFDataGenerator
-
- main(String[]) - Static method in class org.apache.spark.mllib.util.SVMDataGenerator
-
- makeCopy(Object[]) - Method in class org.apache.spark.sql.execution.SparkPlan
-
Overridden make copy also propogates sqlContext to copied plan.
- makeRDD(Seq<T>, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD.
- makeRDD(Seq<Tuple2<T, Seq<String>>>, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD, with one or more
location preferences (hostnames of Spark nodes) for each object.
- map(Function<T, R>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- map(Function1<R, T>) - Method in class org.apache.spark.partial.PartialResult
-
Transform this PartialResult into a PartialResult of type T.
- map(Function1<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to all elements of this RDD.
- map(Function<T, R>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream.
- map(Function1<T, U>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream.
- mapId() - Method in class org.apache.spark.FetchFailed
-
- mapId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- mapId() - Method in class org.apache.spark.storage.ShuffleIndexBlockId
-
- mapOutputTracker() - Method in class org.apache.spark.SparkEnv
-
- mapPartitions(FlatMapFunction<Iterator<T>, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(FlatMapFunction<Iterator<T>, U>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(Function1<Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitions(FlatMapFunction<Iterator<T>, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitions(Function1<Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitionsToDouble(DoubleFlatMapFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToDouble(DoubleFlatMapFunction<Iterator<T>>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsToPair(PairFlatMapFunction<Iterator<T>, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs
of this DStream.
- mapPartitionsWithContext(Function2<TaskContext, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Return a new RDD by applying a function to each partition of this RDD.
- mapPartitionsWithIndex(Function2<Integer, Iterator<T>, Iterator<R>>, boolean) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapPartitionsWithIndex(Function2<Object, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<R>>, boolean) - Method in class org.apache.spark.api.java.JavaHadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<R>>, boolean) - Method in class org.apache.spark.api.java.JavaNewHadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.HadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithInputSplit(Function2<InputSplit, Iterator<Tuple2<K, V>>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
Maps over a partition, providing the InputSplit that was used as the base of the partition.
- mapPartitionsWithSplit(Function2<Object, Iterator<T>, Iterator<U>>, boolean, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by applying a function to each partition of this RDD, while tracking the index
of the original partition.
- mapredInputFormat() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- mapreduceInputFormat() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- mapSideCombine() - Method in class org.apache.spark.ShuffleDependency
-
- mapToDouble(DoubleFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- mapToPair(PairFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by applying a function to all elements of this RDD.
- mapToPair(PairFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream.
- MapType - Class in org.apache.spark.sql.api.java
-
The data type representing Maps.
- mapValues(Function<V, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a map function without changing the keys;
this also retains the original RDD's partitioning.
- mapValues(Function1<V, U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a map function without changing the keys;
this also retains the original RDD's partitioning.
- mapValues(Function<V, U>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a map function to the value of each key-value pairs in
'this' DStream without changing the key.
- mapValues(Function1<V, U>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a map function to the value of each key-value pairs in
'this' DStream without changing the key.
- mapWith(Function1<Object, A>, boolean, Function2<T, A, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Maps f over this RDD, where f takes an additional parameter of type A.
- master() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- master() - Method in class org.apache.spark.SparkContext
-
- Matrices - Class in org.apache.spark.mllib.linalg
-
- Matrices() - Constructor for class org.apache.spark.mllib.linalg.Matrices
-
- Matrix - Interface in org.apache.spark.mllib.linalg
-
Trait for a local matrix.
- MatrixEntry - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents an entry in an distributed matrix.
- MatrixEntry(long, long, double) - Constructor for class org.apache.spark.mllib.linalg.distributed.MatrixEntry
-
- MatrixFactorizationModel - Class in org.apache.spark.mllib.recommendation
-
Model representing the result of matrix factorization.
- max(Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the maximum element from this RDD as defined by the specified
Comparator[T].
- max() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
- max() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Maximum value of each column.
- max(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the max of this RDD as defined by the implicit Ordering[T].
- max(Duration) - Method in class org.apache.spark.streaming.Duration
-
- max(Time) - Method in class org.apache.spark.streaming.Time
-
- max() - Method in class org.apache.spark.util.StatCounter
-
- maxBins() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxDepth() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- maxMem() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
-
- maxMem() - Method in class org.apache.spark.storage.StorageStatus
-
- maxMemoryInMB() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- mean() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the mean of this RDD's elements.
- mean() - Method in class org.apache.spark.mllib.feature.StandardScalerModel
-
- mean() - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- mean() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
- mean() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample mean vector.
- mean() - Method in class org.apache.spark.partial.BoundedDouble
-
- mean() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the mean of this RDD's elements.
- mean() - Method in class org.apache.spark.util.StatCounter
-
- meanApprox(long, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return the approximate mean of the elements in this RDD.
- meanApprox(long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
:: Experimental ::
Approximate operation to return the mean within a timeout.
- meanApprox(long, double) - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
:: Experimental ::
Approximate operation to return the mean within a timeout.
- MEMORY_AND_DISK - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_SER - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_SER() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_AND_DISK_SER_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_AND_DISK_SER_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_SER - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_SER() - Static method in class org.apache.spark.storage.StorageLevel
-
- MEMORY_ONLY_SER_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- MEMORY_ONLY_SER_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- memRemaining() - Method in class org.apache.spark.storage.StorageStatus
-
Return the memory remaining in this block manager.
- memSize() - Method in class org.apache.spark.storage.BlockStatus
-
- memSize() - Method in class org.apache.spark.storage.RDDInfo
-
- memUsed() - Method in class org.apache.spark.storage.StorageStatus
-
Return the memory used by this block manager.
- memUsedByRdd(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the memory used by the given RDD in this block manager in O(1) time.
- merge(R) - Method in class org.apache.spark.Accumulable
-
Merge two accumulable objects together
- merge(IDF.DocumentFrequencyAggregator) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
-
Merges another.
- merge(MultivariateOnlineSummarizer) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Merge another MultivariateOnlineSummarizer, and update the statistical summary.
- merge(double) - Method in class org.apache.spark.util.StatCounter
-
Add a value into this StatCounter, updating the internal statistics.
- merge(TraversableOnce<Object>) - Method in class org.apache.spark.util.StatCounter
-
Add multiple values into this StatCounter, updating the internal statistics.
- merge(StatCounter) - Method in class org.apache.spark.util.StatCounter
-
Merge another StatCounter into this one, adding up the internal statistics.
- mergeCombiners() - Method in class org.apache.spark.Aggregator
-
- mergeValue() - Method in class org.apache.spark.Aggregator
-
- metadataCleaner() - Method in class org.apache.spark.SparkContext
-
- metastorePath() - Method in class org.apache.spark.sql.hive.LocalHiveContext
-
- metastorePath() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- method() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- metrics() - Method in class org.apache.spark.ExceptionFailure
-
- metricsSystem() - Method in class org.apache.spark.SparkEnv
-
- MFDataGenerator - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
Generate RDD(s) containing data for Matrix Factorization.
- MFDataGenerator() - Constructor for class org.apache.spark.mllib.util.MFDataGenerator
-
- milliseconds() - Method in class org.apache.spark.streaming.Duration
-
- Milliseconds - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of milliseconds.
- Milliseconds() - Constructor for class org.apache.spark.streaming.Milliseconds
-
- milliseconds() - Method in class org.apache.spark.streaming.Time
-
- millisToString(long) - Static method in class org.apache.spark.scheduler.StatsReportListener
-
Reformat a time interval in milliseconds to a prettier format for output
- min(Comparator<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Returns the minimum element from this RDD as defined by the specified
Comparator[T].
- min() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
- min() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Minimum value of each column.
- min(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Returns the min of this RDD as defined by the implicit Ordering[T].
- min(Duration) - Method in class org.apache.spark.streaming.Duration
-
- min(Time) - Method in class org.apache.spark.streaming.Time
-
- min() - Method in class org.apache.spark.util.StatCounter
-
- MinMax() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- minutes() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- Minutes - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of minutes.
- Minutes() - Constructor for class org.apache.spark.streaming.Minutes
-
- MLUtils - Class in org.apache.spark.mllib.util
-
Helper methods to load, save and pre-process data used in ML Lib.
- MLUtils() - Constructor for class org.apache.spark.mllib.util.MLUtils
-
- model() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
- MQTTUtils - Class in org.apache.spark.streaming.mqtt
-
- MQTTUtils() - Constructor for class org.apache.spark.streaming.mqtt.MQTTUtils
-
- MulticlassMetrics - Class in org.apache.spark.mllib.evaluation
-
::Experimental::
Evaluator for multiclass classification.
- MulticlassMetrics(RDD<Tuple2<Object, Object>>) - Constructor for class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
- multiply(Matrix) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
Multiply this matrix by a local matrix on the right.
- multiply(Matrix) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Multiply this matrix by a local matrix on the right.
- multiply(double) - Method in class org.apache.spark.util.Vector
-
- MultivariateOnlineSummarizer - Class in org.apache.spark.mllib.stat
-
:: DeveloperApi ::
MultivariateOnlineSummarizer implements
MultivariateStatisticalSummary
to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector
format in a online fashion.
- MultivariateOnlineSummarizer() - Constructor for class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
- MultivariateStatisticalSummary - Interface in org.apache.spark.mllib.stat
-
Trait for multivariate statistical summary of a data matrix.
- mustCheckpoint() - Method in class org.apache.spark.streaming.dstream.DStream
-
- MutablePair<T1,T2> - Class in org.apache.spark.util
-
:: DeveloperApi ::
A tuple of 2 elements.
- MutablePair(T1, T2) - Constructor for class org.apache.spark.util.MutablePair
-
- MutablePair() - Constructor for class org.apache.spark.util.MutablePair
-
No-arg constructor for serialization
- objectFile(String, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and
BytesWritable values that contain a serialized partition.
- objectFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and
BytesWritable values that contain a serialized partition.
- objectFile(String, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and
BytesWritable values that contain a serialized partition.
- OFF_HEAP - Static variable in class org.apache.spark.api.java.StorageLevels
-
- OFF_HEAP() - Static method in class org.apache.spark.storage.StorageLevel
-
- offHeapUsed() - Method in class org.apache.spark.storage.StorageStatus
-
Return the off-heap space used by this block manager.
- offHeapUsedByRdd(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the off-heap space used by the given RDD in this block manager in O(1) time.
- onApplicationEnd(SparkListenerApplicationEnd) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when the application ends
- onApplicationStart(SparkListenerApplicationStart) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when the application starts
- onBatchCompleted(StreamingListenerBatchCompleted) - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- onBatchCompleted(StreamingListenerBatchCompleted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when processing of a batch of jobs has completed.
- onBatchStarted(StreamingListenerBatchStarted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when processing of a batch of jobs has started.
- onBatchSubmitted(StreamingListenerBatchSubmitted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a batch of jobs has been submitted for processing.
- onBlockManagerAdded(SparkListenerBlockManagerAdded) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a new block manager has joined
- onBlockManagerAdded(SparkListenerBlockManagerAdded) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onBlockManagerAdded(SparkListenerBlockManagerAdded) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onBlockManagerRemoved(SparkListenerBlockManagerRemoved) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when an existing block manager has been removed
- onBlockManagerRemoved(SparkListenerBlockManagerRemoved) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onBlockManagerRemoved(SparkListenerBlockManagerRemoved) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onComplete(Function1<Try<T>, U>, ExecutionContext) - Method in class org.apache.spark.ComplexFutureAction
-
- onComplete(Function1<Try<T>, U>, ExecutionContext) - Method in interface org.apache.spark.FutureAction
-
When this action is completed, either through an exception, or a value, applies the provided
function.
- onComplete(Function1<R, BoxedUnit>) - Method in class org.apache.spark.partial.PartialResult
-
Set a handler to be called when this PartialResult completes.
- onComplete(Function1<Try<T>, U>, ExecutionContext) - Method in class org.apache.spark.SimpleFutureAction
-
- onEnvironmentUpdate(SparkListenerEnvironmentUpdate) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when environment properties have been updated
- onEnvironmentUpdate(SparkListenerEnvironmentUpdate) - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- onEnvironmentUpdate(SparkListenerEnvironmentUpdate) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- ones(int) - Static method in class org.apache.spark.util.Vector
-
- OneToOneDependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a one-to-one dependency between partitions of the parent and child RDDs.
- OneToOneDependency(RDD<T>) - Constructor for class org.apache.spark.OneToOneDependency
-
- onExecutorMetricsUpdate(SparkListenerExecutorMetricsUpdate) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when the driver receives task metrics from an executor in a heartbeat.
- onExecutorMetricsUpdate(SparkListenerExecutorMetricsUpdate) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onFail(Function1<Exception, BoxedUnit>) - Method in class org.apache.spark.partial.PartialResult
-
Set a handler to be called if this PartialResult's job fails.
- onJobEnd(SparkListenerJobEnd) - Method in class org.apache.spark.scheduler.JobLogger
-
When job ends, recording job completion status and close log file
- onJobEnd(SparkListenerJobEnd) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a job ends
- onJobStart(SparkListenerJobStart) - Method in class org.apache.spark.scheduler.JobLogger
-
When job starts, record job property and stage graph
- onJobStart(SparkListenerJobStart) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a job starts
- onReceiverError(StreamingListenerReceiverError) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a receiver has reported an error
- onReceiverStarted(StreamingListenerReceiverStarted) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a receiver has been started
- onReceiverStopped(StreamingListenerReceiverStopped) - Method in interface org.apache.spark.streaming.scheduler.StreamingListener
-
Called when a receiver has been stopped
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.scheduler.JobLogger
-
When stage is completed, record stage completion status
- onStageCompleted(SparkListenerStageCompleted) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a stage completes successfully or fails, with information on the completed stage.
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.scheduler.StatsReportListener
-
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onStageCompleted(SparkListenerStageCompleted) - Method in class org.apache.spark.ui.storage.StorageListener
-
- onStageSubmitted(SparkListenerStageSubmitted) - Method in class org.apache.spark.scheduler.JobLogger
-
When stage is submitted, record stage submit info
- onStageSubmitted(SparkListenerStageSubmitted) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a stage is submitted
- onStageSubmitted(SparkListenerStageSubmitted) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
For FIFO, all stages are contained by "default" pool but "default" pool here is meaningless
- onStageSubmitted(SparkListenerStageSubmitted) - Method in class org.apache.spark.ui.storage.StorageListener
-
- onStart() - Method in class org.apache.spark.streaming.receiver.Receiver
-
This method is called by the system when the receiver is started.
- onStop() - Method in class org.apache.spark.streaming.receiver.Receiver
-
This method is called by the system when the receiver is stopped.
- onTaskCompletion(TaskContext) - Method in interface org.apache.spark.util.TaskCompletionListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.scheduler.JobLogger
-
When task ends, record task completion status and metrics
- onTaskEnd(SparkListenerTaskEnd) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a task ends
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.scheduler.StatsReportListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.ui.exec.ExecutorsListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onTaskEnd(SparkListenerTaskEnd) - Method in class org.apache.spark.ui.storage.StorageListener
-
Assumes the storage status list is fully up-to-date.
- onTaskGettingResult(SparkListenerTaskGettingResult) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a task begins remotely fetching its result (will not be called for tasks that do
not need to fetch the result remotely).
- onTaskGettingResult(SparkListenerTaskGettingResult) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onTaskStart(SparkListenerTaskStart) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when a task starts
- onTaskStart(SparkListenerTaskStart) - Method in class org.apache.spark.ui.exec.ExecutorsListener
-
- onTaskStart(SparkListenerTaskStart) - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- onUnpersistRDD(SparkListenerUnpersistRDD) - Method in interface org.apache.spark.scheduler.SparkListener
-
Called when an RDD is manually unpersisted by the application
- onUnpersistRDD(SparkListenerUnpersistRDD) - Method in class org.apache.spark.storage.StorageStatusListener
-
- onUnpersistRDD(SparkListenerUnpersistRDD) - Method in class org.apache.spark.ui.storage.StorageListener
-
- optimize(RDD<Tuple2<Object, Vector>>, Vector) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
:: DeveloperApi ::
Runs gradient descent on the given training data.
- optimize(RDD<Tuple2<Object, Vector>>, Vector) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
- optimize(RDD<Tuple2<Object, Vector>>, Vector) - Method in interface org.apache.spark.mllib.optimization.Optimizer
-
Solve the provided convex optimization problem.
- optimizer() - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-
- optimizer() - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
- optimizer() - Method in class org.apache.spark.mllib.classification.SVMWithSGD
-
- Optimizer - Interface in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Trait for optimization problem solvers.
- optimizer() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
The optimizer to solve the problem.
- optimizer() - Method in class org.apache.spark.mllib.regression.LassoWithSGD
-
- optimizer() - Method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
- optimizer() - Method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
- orderBy(Seq<SortOrder>) - Method in class org.apache.spark.sql.SchemaRDD
-
Sorts the results by the given expressions.
- OrderedRDDFunctions<K,V,P extends scala.Product2<K,V>> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs where the key is sortable through
an implicit conversion.
- OrderedRDDFunctions(RDD<P>, Ordering<K>, ClassTag<K>, ClassTag<V>, ClassTag<P>) - Constructor for class org.apache.spark.rdd.OrderedRDDFunctions
-
- ordering() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- ordering() - Static method in class org.apache.spark.streaming.Time
-
- org.apache.spark - package org.apache.spark
-
Core Spark classes in Scala.
- org.apache.spark.annotation - package org.apache.spark.annotation
-
Spark annotations to mark an API experimental or intended only for advanced usages by developers.
- org.apache.spark.api.java - package org.apache.spark.api.java
-
Spark Java programming APIs.
- org.apache.spark.api.java.function - package org.apache.spark.api.java.function
-
Set of interfaces to represent functions in Spark's Java API.
- org.apache.spark.broadcast - package org.apache.spark.broadcast
-
Spark's broadcast variables, used to broadcast immutable datasets to all nodes.
- org.apache.spark.examples.streaming - package org.apache.spark.examples.streaming
-
- org.apache.spark.io - package org.apache.spark.io
-
IO codecs used for compression.
- org.apache.spark.mllib.classification - package org.apache.spark.mllib.classification
-
- org.apache.spark.mllib.clustering - package org.apache.spark.mllib.clustering
-
- org.apache.spark.mllib.evaluation - package org.apache.spark.mllib.evaluation
-
- org.apache.spark.mllib.feature - package org.apache.spark.mllib.feature
-
- org.apache.spark.mllib.linalg - package org.apache.spark.mllib.linalg
-
- org.apache.spark.mllib.linalg.distributed - package org.apache.spark.mllib.linalg.distributed
-
- org.apache.spark.mllib.optimization - package org.apache.spark.mllib.optimization
-
- org.apache.spark.mllib.random - package org.apache.spark.mllib.random
-
- org.apache.spark.mllib.recommendation - package org.apache.spark.mllib.recommendation
-
- org.apache.spark.mllib.regression - package org.apache.spark.mllib.regression
-
- org.apache.spark.mllib.stat - package org.apache.spark.mllib.stat
-
- org.apache.spark.mllib.stat.test - package org.apache.spark.mllib.stat.test
-
- org.apache.spark.mllib.tree - package org.apache.spark.mllib.tree
-
- org.apache.spark.mllib.tree.configuration - package org.apache.spark.mllib.tree.configuration
-
- org.apache.spark.mllib.tree.impurity - package org.apache.spark.mllib.tree.impurity
-
- org.apache.spark.mllib.tree.model - package org.apache.spark.mllib.tree.model
-
- org.apache.spark.mllib.util - package org.apache.spark.mllib.util
-
- org.apache.spark.partial - package org.apache.spark.partial
-
- org.apache.spark.rdd - package org.apache.spark.rdd
-
Provides implementation's of various RDDs.
- org.apache.spark.scheduler - package org.apache.spark.scheduler
-
Spark's DAG scheduler.
- org.apache.spark.serializer - package org.apache.spark.serializer
-
Pluggable serializers for RDD and shuffle data.
- org.apache.spark.sql - package org.apache.spark.sql
-
- org.apache.spark.sql.api.java - package org.apache.spark.sql.api.java
-
Allows the execution of relational queries, including those expressed in SQL using Spark.
- org.apache.spark.sql.execution - package org.apache.spark.sql.execution
-
- org.apache.spark.sql.hive - package org.apache.spark.sql.hive
-
- org.apache.spark.sql.hive.api.java - package org.apache.spark.sql.hive.api.java
-
- org.apache.spark.sql.hive.execution - package org.apache.spark.sql.hive.execution
-
- org.apache.spark.sql.hive.parquet - package org.apache.spark.sql.hive.parquet
-
- org.apache.spark.sql.hive.test - package org.apache.spark.sql.hive.test
-
- org.apache.spark.sql.parquet - package org.apache.spark.sql.parquet
-
- org.apache.spark.sql.test - package org.apache.spark.sql.test
-
- org.apache.spark.storage - package org.apache.spark.storage
-
- org.apache.spark.streaming - package org.apache.spark.streaming
-
- org.apache.spark.streaming.api.java - package org.apache.spark.streaming.api.java
-
Java APIs for spark streaming.
- org.apache.spark.streaming.dstream - package org.apache.spark.streaming.dstream
-
Various implementations of DStreams.
- org.apache.spark.streaming.flume - package org.apache.spark.streaming.flume
-
Spark streaming receiver for Flume.
- org.apache.spark.streaming.kafka - package org.apache.spark.streaming.kafka
-
Kafka receiver for spark streaming.
- org.apache.spark.streaming.kinesis - package org.apache.spark.streaming.kinesis
-
- org.apache.spark.streaming.mqtt - package org.apache.spark.streaming.mqtt
-
MQTT receiver for Spark Streaming.
- org.apache.spark.streaming.receiver - package org.apache.spark.streaming.receiver
-
- org.apache.spark.streaming.scheduler - package org.apache.spark.streaming.scheduler
-
- org.apache.spark.streaming.twitter - package org.apache.spark.streaming.twitter
-
Twitter feed receiver for spark streaming.
- org.apache.spark.streaming.zeromq - package org.apache.spark.streaming.zeromq
-
Zeromq receiver for spark streaming.
- org.apache.spark.ui.env - package org.apache.spark.ui.env
-
- org.apache.spark.ui.exec - package org.apache.spark.ui.exec
-
- org.apache.spark.ui.jobs - package org.apache.spark.ui.jobs
-
- org.apache.spark.ui.storage - package org.apache.spark.ui.storage
-
- org.apache.spark.util - package org.apache.spark.util
-
Spark utilities.
- org.apache.spark.util.random - package org.apache.spark.util.random
-
Utilities for random number generation.
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.ExplainCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.execution.SetCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.NativeCommand
-
- otherCopyArgs() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- otherInfo() - Method in class org.apache.spark.streaming.receiver.Statistics
-
- outer() - Method in class org.apache.spark.sql.execution.Generate
-
- output() - Method in class org.apache.spark.sql.execution.Aggregate
-
- output() - Method in class org.apache.spark.sql.execution.BatchPythonEvaluation
-
- output() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- output() - Method in class org.apache.spark.sql.execution.CacheCommand
-
- output() - Method in class org.apache.spark.sql.execution.CartesianProduct
-
- output() - Method in class org.apache.spark.sql.execution.DescribeCommand
-
- output() - Method in class org.apache.spark.sql.execution.Distinct
-
- output() - Method in class org.apache.spark.sql.execution.EvaluatePython
-
- output() - Method in class org.apache.spark.sql.execution.Except
-
- output() - Method in class org.apache.spark.sql.execution.Exchange
-
- output() - Method in class org.apache.spark.sql.execution.ExistingRdd
-
- output() - Method in class org.apache.spark.sql.execution.ExplainCommand
-
- output() - Method in class org.apache.spark.sql.execution.Filter
-
- output() - Method in class org.apache.spark.sql.execution.Generate
-
- output() - Method in class org.apache.spark.sql.execution.GeneratedAggregate
-
- output() - Method in interface org.apache.spark.sql.execution.HashJoin
-
- output() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- output() - Method in class org.apache.spark.sql.execution.Intersect
-
- output() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- output() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- output() - Method in class org.apache.spark.sql.execution.Limit
-
- output() - Method in class org.apache.spark.sql.execution.OutputFaker
-
- output() - Method in class org.apache.spark.sql.execution.Project
-
- output() - Method in class org.apache.spark.sql.execution.Sample
-
- output() - Method in class org.apache.spark.sql.execution.SetCommand
-
- output() - Method in class org.apache.spark.sql.execution.Sort
-
- output() - Method in class org.apache.spark.sql.execution.SparkLogicalPlan
-
- output() - Method in class org.apache.spark.sql.execution.TakeOrdered
-
- output() - Method in class org.apache.spark.sql.execution.Union
-
- output() - Method in class org.apache.spark.sql.hive.execution.AnalyzeTable
-
- output() - Method in class org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
-
- output() - Method in class org.apache.spark.sql.hive.execution.DropTable
-
- output() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- output() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- output() - Method in class org.apache.spark.sql.hive.execution.NativeCommand
-
- output() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- output() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- output() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- outputClass() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- OutputFaker - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
A plan node that does nothing but lie about the output of its child.
- OutputFaker(Seq<Attribute>, SparkPlan) - Constructor for class org.apache.spark.sql.execution.OutputFaker
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.BroadcastHashJoin
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.Exchange
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.ShuffledHashJoin
-
- outputPartitioning() - Method in class org.apache.spark.sql.execution.SparkPlan
-
Specifies how data is partitioned across different nodes in the cluster.
- overwrite() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- overwrite() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- PairDStreamFunctions<K,V> - Class in org.apache.spark.streaming.dstream
-
Extra functions available on DStream of (key, value) pairs through an implicit conversion.
- PairDStreamFunctions(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Constructor for class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
- PairFlatMapFunction<T,K,V> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more key-value pair records from each input record.
- PairFunction<T,K,V> - Interface in org.apache.spark.api.java.function
-
A function that returns key-value pairs (Tuple2), and can be used to construct PairRDDs.
- PairRDDFunctions<K,V> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs through an implicit conversion.
- PairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Constructor for class org.apache.spark.rdd.PairRDDFunctions
-
- parallelize(List<T>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelize(List<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelize(Seq<T>, int, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizeDoubles(List<Double>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizeDoubles(List<Double>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizePairs(List<Tuple2<K, V>>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parallelizePairs(List<Tuple2<K, V>>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Distribute a local Scala collection to form an RDD.
- parquetFile(String) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
- parquetFile(String) - Method in class org.apache.spark.sql.SQLContext
-
Loads a Parquet file, returning the result as a
SchemaRDD
.
- ParquetTableScan - Class in org.apache.spark.sql.parquet
-
Parquet table scan operator.
- ParquetTableScan(Seq<Attribute>, ParquetRelation, Seq<Expression>) - Constructor for class org.apache.spark.sql.parquet.ParquetTableScan
-
- parse(String) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Parses a string resulted from
Vector#toString
into
an
Vector
.
- parse(String) - Static method in class org.apache.spark.mllib.regression.LabeledPoint
-
Parses a string resulted from
LabeledPoint#toString
into
an
LabeledPoint
.
- partial() - Method in class org.apache.spark.sql.execution.Aggregate
-
- partial() - Method in class org.apache.spark.sql.execution.Distinct
-
- partial() - Method in class org.apache.spark.sql.execution.GeneratedAggregate
-
- PartialResult<R> - Class in org.apache.spark.partial
-
- PartialResult(R, boolean) - Constructor for class org.apache.spark.partial.PartialResult
-
- Partition - Interface in org.apache.spark
-
A partition of an RDD.
- partition() - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- partitionBy(Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a copy of the RDD partitioned using the specified partitioner.
- partitionBy(Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return a copy of the RDD partitioned using the specified partitioner.
- Partitioner - Class in org.apache.spark
-
An object that defines how the elements in a key-value pair RDD are partitioned by key.
- Partitioner() - Constructor for class org.apache.spark.Partitioner
-
- partitioner() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- partitioner() - Method in class org.apache.spark.rdd.RDD
-
Optionally overridden by subclasses to specify how they are partitioned.
- partitioner() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- partitioner() - Method in class org.apache.spark.ShuffleDependency
-
- partitionId() - Method in class org.apache.spark.TaskContext
-
- partitionPruningPred() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- PartitionPruningRDD<T> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on
all partitions.
- PartitionPruningRDD(RDD<T>, Function1<Object, Object>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.PartitionPruningRDD
-
- partitions() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Set of partitions in this RDD.
- partitions() - Method in class org.apache.spark.rdd.RDD
-
Get the array of partitions of this RDD, taking into account whether the
RDD is checkpointed or not.
- partOutput() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- path() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- path() - Method in class org.apache.spark.scheduler.SplitInfo
-
- percentiles() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- percentilesHeader() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.api.java.JavaRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist(StorageLevel) - Method in class org.apache.spark.rdd.RDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- persist() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- persist(StorageLevel) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
- persist() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persist(StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist the RDDs of this DStream with the given storage level
- persist() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persist(StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist the RDDs of this DStream with the given storage level
- persist(StorageLevel) - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist the RDDs of this DStream with the given storage level
- persist() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- persistentRdds() - Method in class org.apache.spark.SparkContext
-
- pi() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- pipe(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(List<String>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(List<String>, Map<String, String>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by piping elements to a forked external process.
- pipe(String) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- pipe(String, Map<String, String>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- pipe(Seq<String>, Map<String, String>, Function1<Function1<String, BoxedUnit>, BoxedUnit>, Function2<T, Function1<String, BoxedUnit>, BoxedUnit>, boolean) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by piping elements to a forked external process.
- plusDot(Vector, Vector) - Method in class org.apache.spark.util.Vector
-
return (this + plus) dot other, but without creating any intermediate storage
- PoissonGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- PoissonGenerator(double) - Constructor for class org.apache.spark.mllib.random.PoissonGenerator
-
- poissonJavaRDD(JavaSparkContext, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaRDD(JavaSparkContext, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaRDD(JavaSparkContext, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaVectorRDD(JavaSparkContext, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaVectorRDD(JavaSparkContext, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonJavaVectorRDD(JavaSparkContext, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- poissonRDD(SparkContext, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD comprised of i.i.d.
- PoissonSampler<T> - Class in org.apache.spark.util.random
-
:: DeveloperApi ::
A sampler based on values drawn from Poisson distribution.
- PoissonSampler(double) - Constructor for class org.apache.spark.util.random.PoissonSampler
-
- poissonVectorRDD(SparkContext, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD[Vector] with vectors containing i.i.d.
- poolToActiveStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- port() - Method in class org.apache.spark.storage.BlockManagerId
-
- pr() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the precision-recall curve, which is an RDD of (recall, precision),
NOT (precision, recall), with (0.0, 1.0) prepended to it.
- precision(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns precision for a given label (category)
- precision() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns precision
- precisionByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, precision) curve.
- predict(RDD<Vector>) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for a single data point using the model trained.
- predict(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.classification.ClassificationModel
-
Predict values for examples stored in a JavaRDD.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- predict(Vector) - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- predict(Vector) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Returns the cluster index that a given point belongs to.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Maps given points to their cluster indices.
- predict(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Maps given points to their cluster indices.
- predict(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Predict the rating of one user for one product.
- predict(RDD<Tuple2<Object, Object>>) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Predict the rating of many users for many products.
- predict(JavaRDD<byte[]>) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
:: DeveloperApi ::
Predict the rating of many users for many products.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
Predict values for a single data point using the model trained.
- predict(RDD<Vector>) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for the given data set using the model trained.
- predict(Vector) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for a single data point using the model trained.
- predict(JavaRDD<Vector>) - Method in interface org.apache.spark.mllib.regression.RegressionModel
-
Predict values for examples stored in a JavaRDD.
- predict(Vector) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for a single data point using the model trained.
- predict(RDD<Vector>) - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Predict values for the given data set using the model trained.
- predict() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- predict() - Method in class org.apache.spark.mllib.tree.model.Node
-
- predict(Vector) - Method in class org.apache.spark.mllib.tree.model.Node
-
predict value if node is not leaf
- predictOn(DStream<Vector>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Use the model to make predictions on batches of data from a DStream
- predictOnValues(DStream<Tuple2<K, Vector>>, ClassTag<K>) - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
Use the model to make predictions on the values of a DStream and carry over its keys.
- preferredLocation() - Method in class org.apache.spark.streaming.receiver.Receiver
-
Override this to specify a preferred location (hostname).
- preferredLocations(Partition) - Method in class org.apache.spark.rdd.RDD
-
Get the preferred locations of a partition (as hostnames), taking into account whether the
RDD is checkpointed.
- preferredNodeLocationData() - Method in class org.apache.spark.SparkContext
-
- prettyPrint() - Method in class org.apache.spark.streaming.Duration
-
- prev() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- print() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Print the first ten elements of each RDD generated in this DStream.
- print() - Method in class org.apache.spark.streaming.dstream.DStream
-
Print the first ten elements of each RDD generated in this DStream.
- printStats() - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
-
- prob() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- probabilities() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- PROCESS_LOCAL() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- processingDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for the all jobs of this batch to finish processing from the time they started
processing.
- processingEndTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- processingStartTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- product() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- productFeatures() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- productToRowRdd(RDD<A>) - Static method in class org.apache.spark.sql.execution.ExistingRdd
-
- Project - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Project(Seq<NamedExpression>, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Project
-
- projectList() - Method in class org.apache.spark.sql.execution.Project
-
- properties() - Method in class org.apache.spark.scheduler.SparkListenerJobStart
-
- properties() - Method in class org.apache.spark.scheduler.SparkListenerStageSubmitted
-
- pruneColumns(Seq<Attribute>) - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- Pseudorandom - Interface in org.apache.spark.util.random
-
:: DeveloperApi ::
A class with pseudorandom behavior.
- putCachedMetadata(String, Object) - Static method in class org.apache.spark.rdd.HadoopRDD
-
- pValue() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- pValue() - Method in interface org.apache.spark.mllib.stat.test.TestResult
-
The probability of obtaining a test statistic result at least as extreme as the one that was
actually observed, assuming that the null hypothesis is true.
- RACK_LOCAL() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- RANDOM() - Static method in class org.apache.spark.mllib.clustering.KMeans
-
- random(int, Random) - Static method in class org.apache.spark.util.Vector
-
Creates this
Vector
of given length containing random numbers
between 0.0 and 1.0.
- RandomDataGenerator<T> - Interface in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Trait for random data generators that generate i.i.d.
- randomRDD(SparkContext, RandomDataGenerator<T>, long, int, long, ClassTag<T>) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
:: DeveloperApi ::
Generates an RDD comprised of i.i.d.
- RandomRDDs - Class in org.apache.spark.mllib.random
-
:: Experimental ::
Generator methods for creating RDDs comprised of i.i.d.
- RandomRDDs() - Constructor for class org.apache.spark.mllib.random.RandomRDDs
-
- RandomSampler<T,U> - Interface in org.apache.spark.util.random
-
:: DeveloperApi ::
A pseudorandom sampler.
- randomSplit(double[]) - Method in class org.apache.spark.api.java.JavaRDD
-
Randomly splits this RDD with the provided weights.
- randomSplit(double[], long) - Method in class org.apache.spark.api.java.JavaRDD
-
Randomly splits this RDD with the provided weights.
- randomSplit(double[], long) - Method in class org.apache.spark.rdd.RDD
-
Randomly splits this RDD with the provided weights.
- randomVectorRDD(SparkContext, RandomDataGenerator<Object>, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
:: DeveloperApi ::
Generates an RDD[Vector] with vectors containing i.i.d.
- RangeDependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
- RangeDependency(RDD<T>, int, int, int) - Constructor for class org.apache.spark.RangeDependency
-
- RangePartitioner<K,V> - Class in org.apache.spark
-
A
Partitioner
that partitions sortable records by range into roughly
equal ranges.
- RangePartitioner(int, RDD<? extends Product2<K, V>>, boolean, Ordering<K>, ClassTag<K>) - Constructor for class org.apache.spark.RangePartitioner
-
- rank() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- Rating - Class in org.apache.spark.mllib.recommendation
-
:: Experimental ::
A more compact class to represent a rating than Tuple3[Int, Int, Double].
- Rating(int, int, double) - Constructor for class org.apache.spark.mllib.recommendation.Rating
-
- rating() - Method in class org.apache.spark.mllib.recommendation.Rating
-
- rawSocketStream(String, int, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rawSocketStream(String, int) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rawSocketStream(String, int, StorageLevel, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream from network source hostname:port, where data is received
as serialized blocks (serialized using the Spark's serializer) that can be directly
pushed into the block manager without deserializing them.
- rdd() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- rdd() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- rdd() - Method in class org.apache.spark.api.java.JavaRDD
-
- rdd() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- rdd() - Method in class org.apache.spark.Dependency
-
- rdd() - Method in class org.apache.spark.NarrowDependency
-
- RDD<T> - Class in org.apache.spark.rdd
-
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
- RDD(SparkContext, Seq<Dependency<?>>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.RDD
-
- RDD(RDD<?>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.RDD
-
Construct an RDD with just a one-to-one dependency on one parent
- rdd() - Method in class org.apache.spark.ShuffleDependency
-
- rdd() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
- rdd() - Method in class org.apache.spark.sql.execution.ExistingRdd
-
- RDD() - Static method in class org.apache.spark.storage.BlockId
-
- RDDBlockId - Class in org.apache.spark.storage
-
- RDDBlockId(int, int) - Constructor for class org.apache.spark.storage.RDDBlockId
-
- rddBlocks() - Method in class org.apache.spark.storage.StorageStatus
-
Return the RDD blocks stored in this block manager.
- rddBlocksById(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the blocks that belong to the given RDD stored in this block manager.
- rddId() - Method in class org.apache.spark.scheduler.SparkListenerUnpersistRDD
-
- rddId() - Method in class org.apache.spark.storage.RDDBlockId
-
- RDDInfo - Class in org.apache.spark.storage
-
- RDDInfo(int, String, int, StorageLevel) - Constructor for class org.apache.spark.storage.RDDInfo
-
- rddInfoList() - Method in class org.apache.spark.ui.storage.StorageListener
-
Filter RDD info to include only those with cached partitions
- rddInfos() - Method in class org.apache.spark.scheduler.StageInfo
-
- rdds() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- rdds() - Method in class org.apache.spark.rdd.UnionRDD
-
- rddStorageLevel(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the storage level, if any, used by the given RDD in this block manager.
- rddToAsyncRDDActions(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.SparkContext
-
- rddToOrderedRDDFunctions(RDD<Tuple2<K, V>>, Ordering<K>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.SparkContext
-
- rddToPairRDDFunctions(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>, Ordering<K>) - Static method in class org.apache.spark.SparkContext
-
- rddToSequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Static method in class org.apache.spark.SparkContext
-
- readExternal(ObjectInput) - Method in class org.apache.spark.serializer.JavaSerializer
-
- readExternal(ObjectInput) - Method in class org.apache.spark.storage.BlockManagerId
-
- readExternal(ObjectInput) - Method in class org.apache.spark.storage.StorageLevel
-
- readExternal(ObjectInput) - Method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- readObject(ClassTag<T>) - Method in class org.apache.spark.serializer.DeserializationStream
-
- ready(Duration, CanAwait) - Method in class org.apache.spark.ComplexFutureAction
-
- ready(Duration, CanAwait) - Method in interface org.apache.spark.FutureAction
-
Blocks until this action completes.
- ready(Duration, CanAwait) - Method in class org.apache.spark.SimpleFutureAction
-
- reason() - Method in class org.apache.spark.scheduler.SparkListenerTaskEnd
-
- recall(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns recall for a given label (category)
- recall() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns recall
(equals to precision for multiclass classifier
because sum of all false positives is equal to sum
of all false negatives)
- recallByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, recall) curve.
- receivedBlockInfo() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
- Receiver<T> - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
Abstract class of a receiver that can be run on worker nodes to receive external data.
- Receiver(StorageLevel) - Constructor for class org.apache.spark.streaming.receiver.Receiver
-
- ReceiverInfo - Class in org.apache.spark.streaming.scheduler
-
:: DeveloperApi ::
Class having information about a receiver
- ReceiverInfo(int, String, ActorRef, boolean, String, String, String) - Constructor for class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverError
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStarted
-
- receiverInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerReceiverStopped
-
- receiverInputDStream() - Method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- receiverInputDStream() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- ReceiverInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
Abstract class for defining any
InputDStream
that has to start a receiver on worker nodes to receive external data.
- ReceiverInputDStream(StreamingContext, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- receiverStream(Receiver<T>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented receiver.
- receiverStream(Receiver<T>, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented receiver.
- recommendProducts(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Recommends products to a user.
- recommendUsers(int, int) - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
Recommends users to a product.
- reduce(Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Reduces the elements of this RDD using the specified commutative and associative binary
operator.
- reduce(Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Reduces the elements of this RDD using the specified commutative and
associative binary operator.
- reduce(Function2<T, T, T>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing each RDD
of this DStream.
- reduce(Function2<T, T, T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing each RDD
of this DStream.
- reduceByKey(Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKey(Function2<V, V, V>, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
to each RDD.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Create a new DStream by applying reduceByKey
over a sliding window on this
DStream.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by reducing over a using incremental computation.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, int, Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, Partitioner, Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window on this
DStream.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, int) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Duration, Duration, Partitioner) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, int, Function1<Tuple2<K, V>, Object>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyAndWindow(Function2<V, V, V>, Function2<V, V, V>, Duration, Duration, Partitioner, Function1<Tuple2<K, V>, Object>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying incremental reduceByKey
over a sliding window.
- reduceByKeyLocally(Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative reduce function, but return the results
immediately to the master as a Map.
- reduceByKeyLocally(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative reduce function, but return the results
immediately to the master as a Map.
- reduceByKeyToDriver(Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Alias for reduceByKeyLocally
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Function2<T, T, T>, Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceByWindow(Function2<T, T, T>, Function2<T, T, T>, Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by reducing all
elements in a sliding window over this DStream.
- reduceId() - Method in class org.apache.spark.FetchFailed
-
- reduceId() - Method in class org.apache.spark.storage.ShuffleBlockId
-
- reduceId() - Method in class org.apache.spark.storage.ShuffleIndexBlockId
-
- registerClasses(Kryo) - Method in interface org.apache.spark.serializer.KryoRegistrator
-
- registerRDDAsTable(JavaSchemaRDD, String) - Method in class org.apache.spark.sql.api.java.JavaSQLContext
-
Registers the given RDD as a temporary table in the catalog.
- registerRDDAsTable(SchemaRDD, String) - Method in class org.apache.spark.sql.SQLContext
-
Registers the given RDD as a temporary table in the catalog.
- registerTestTable(TestHiveContext.TestTable) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- Regression() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- RegressionModel - Interface in org.apache.spark.mllib.regression
-
- relation() - Method in class org.apache.spark.sql.hive.execution.HiveTableScan
-
- relation() - Method in class org.apache.spark.sql.parquet.InsertIntoParquetTable
-
- relation() - Method in class org.apache.spark.sql.parquet.ParquetTableScan
-
- remember(Duration) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets each DStreams in this context to remember RDDs it generated in the last given duration.
- remember(Duration) - Method in class org.apache.spark.streaming.StreamingContext
-
Set each DStreams in this context to remember RDDs it generated in the last given duration.
- rememberDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- remove(String) - Method in class org.apache.spark.SparkConf
-
Remove a parameter from the configuration
- repartition(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that has exactly numPartitions partitions.
- repartition(int) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Return a new RDD that has exactly numPartitions
partitions.
- repartition(int, Ordering<Row>) - Method in class org.apache.spark.sql.SchemaRDD
-
- repartition(int) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartition(int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream with an increased or decreased level of parallelism.
- repartition(int) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream with an increased or decreased level of parallelism.
- replication() - Method in class org.apache.spark.storage.StorageLevel
-
- reportError(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Report exceptions in receiving data.
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.Aggregate
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.BroadcastHashJoin
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.Distinct
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.GeneratedAggregate
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.ShuffledHashJoin
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.Sort
-
- requiredChildDistribution() - Method in class org.apache.spark.sql.execution.SparkPlan
-
Specifies any partition requirements on the input data for this operator.
- reset() - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
Resets the test instance by deleting any tables that have been created.
- restart(String) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- restart(String, Throwable) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- restart(String, Throwable, int) - Method in class org.apache.spark.streaming.receiver.Receiver
-
Restart the receiver.
- Resubmitted - Class in org.apache.spark
-
:: DeveloperApi ::
A ShuffleMapTask
that completed successfully earlier, but we
lost the executor before the stage completed.
- Resubmitted() - Constructor for class org.apache.spark.Resubmitted
-
- result(Duration, CanAwait) - Method in class org.apache.spark.ComplexFutureAction
-
- result(Duration, CanAwait) - Method in interface org.apache.spark.FutureAction
-
Awaits and returns the result (of type T) of this action.
- result(Duration, CanAwait) - Method in class org.apache.spark.SimpleFutureAction
-
- result() - Method in class org.apache.spark.sql.execution.AggregateEvaluation
-
- resultAttribute() - Method in class org.apache.spark.sql.execution.Aggregate.ComputedAggregate
-
- resultAttribute() - Method in class org.apache.spark.sql.execution.EvaluatePython
-
- resultSetToObjectArray(ResultSet) - Static method in class org.apache.spark.rdd.JdbcRDD
-
- retainedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- RidgeRegressionModel - Class in org.apache.spark.mllib.regression
-
Regression model trained using RidgeRegression.
- RidgeRegressionModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- RidgeRegressionWithSGD - Class in org.apache.spark.mllib.regression
-
Train a regression model with L2-regularization using Stochastic Gradient Descent.
- RidgeRegressionWithSGD() - Constructor for class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
Construct a RidgeRegression object with default parameters: {stepSize: 1.0, numIterations: 100,
regParam: 1.0, miniBatchFraction: 1.0}.
- right() - Method in class org.apache.spark.sql.execution.BroadcastHashJoin
-
- right() - Method in class org.apache.spark.sql.execution.BroadcastNestedLoopJoin
-
- right() - Method in class org.apache.spark.sql.execution.CartesianProduct
-
- right() - Method in class org.apache.spark.sql.execution.Except
-
- right() - Method in interface org.apache.spark.sql.execution.HashJoin
-
- right() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- right() - Method in class org.apache.spark.sql.execution.Intersect
-
- right() - Method in class org.apache.spark.sql.execution.LeftSemiJoinBNL
-
The Broadcast relation
- right() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- right() - Method in class org.apache.spark.sql.execution.ShuffledHashJoin
-
- rightImpurity() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- rightKeys() - Method in class org.apache.spark.sql.execution.BroadcastHashJoin
-
- rightKeys() - Method in interface org.apache.spark.sql.execution.HashJoin
-
- rightKeys() - Method in class org.apache.spark.sql.execution.HashOuterJoin
-
- rightKeys() - Method in class org.apache.spark.sql.execution.LeftSemiJoinHash
-
- rightKeys() - Method in class org.apache.spark.sql.execution.ShuffledHashJoin
-
- rightNode() - Method in class org.apache.spark.mllib.tree.model.Node
-
- rightOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a right outer join of this
and other
.
- rightOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rightOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'right outer join' between RDDs of this
DStream and
other
DStream.
- rng() - Method in class org.apache.spark.util.random.BernoulliSampler
-
- rng() - Method in class org.apache.spark.util.random.PoissonSampler
-
- roc() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the receiver operating characteristic (ROC) curve,
which is an RDD of (false positive rate, true positive rate)
with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- Row - Class in org.apache.spark.sql.api.java
-
A result row from a SparkSQL query.
- Row(Row) - Constructor for class org.apache.spark.sql.api.java.Row
-
- row() - Method in class org.apache.spark.sql.api.java.Row
-
- RowMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
:: Experimental ::
Represents a row-oriented distributed Matrix with no meaningful row indices.
- RowMatrix(RDD<Vector>, long, int) - Constructor for class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- RowMatrix(RDD<Vector>) - Constructor for class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Alternative constructor leaving matrix dimensions to be determined automatically.
- rows() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- rows() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
- run(Function0<T>, ExecutionContext) - Method in class org.apache.spark.ComplexFutureAction
-
Executes some action enclosed in the closure.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
- run(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Train a K-means model on the given set of points; data
should be cached for high
performance, because this is an iterative algorithm.
- run(RDD<Rating>) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Run ALS with the configured parameters on an input RDD of (user, product, rating) triples.
- run(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Run the algorithm with the configured parameters on an input
RDD of LabeledPoint entries.
- run(RDD<LabeledPoint>, Vector) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Run the algorithm with the configured parameters on an input RDD
of LabeledPoint entries starting from the initial weights provided.
- runApproximateJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, <any>, long) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Run a job that can return approximate results.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, Function2<Object, U, BoxedUnit>, Function0<R>) - Method in class org.apache.spark.ComplexFutureAction
-
Runs a Spark job.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, boolean, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and pass the results to the given
handler function.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Seq<Object>, boolean, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a function on a given set of partitions in an RDD and return the results as an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Seq<Object>, boolean, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on a given set of partitions of an RDD, but take a function of type
Iterator[T] => U
instead of (TaskContext, Iterator[T]) => U
.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and return the results in an array.
- runJob(RDD<T>, Function1<Iterator<T>, U>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and return the results in an array.
- runJob(RDD<T>, Function2<TaskContext, Iterator<T>, U>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and pass the results to a handler function.
- runJob(RDD<T>, Function1<Iterator<T>, U>, Function2<Object, U, BoxedUnit>, ClassTag<U>) - Method in class org.apache.spark.SparkContext
-
Run a job on all partitions in an RDD and pass the results to a handler function.
- runLBFGS(RDD<Tuple2<Object, Vector>>, Gradient, Updater, int, double, int, double, Vector) - Static method in class org.apache.spark.mllib.optimization.LBFGS
-
Run Limited-memory BFGS (L-BFGS) in parallel.
- runMiniBatchSGD(RDD<Tuple2<Object, Vector>>, Gradient, Updater, double, int, double, double, Vector) - Static method in class org.apache.spark.mllib.optimization.GradientDescent
-
Run stochastic gradient descent (SGD) in parallel using mini batches.
- running() - Method in class org.apache.spark.scheduler.TaskInfo
-
- runningLocally() - Method in class org.apache.spark.TaskContext
-
- runSqlHive(String) - Method in class org.apache.spark.sql.hive.test.TestHiveContext
-
- s() - Method in class org.apache.spark.mllib.linalg.SingularValueDecomposition
-
- sample(boolean, Double) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a sampled subset of this RDD.
- sample(boolean, Double, long) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a sampled subset of this RDD.
- sample(boolean, double, long) - Method in class org.apache.spark.rdd.RDD
-
Return a sampled subset of this RDD.
- Sample - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- Sample(double, boolean, long, SparkPlan) - Constructor for class org.apache.spark.sql.execution.Sample
-
- sample(boolean, double, long) - Method in class org.apache.spark.sql.SchemaRDD
-
:: Experimental ::
Returns a sampled version of the underlying dataset.
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.BernoulliSampler
-
- sample(Iterator<T>) - Method in class org.apache.spark.util.random.PoissonSampler
-
- sample(Iterator<T>) - Method in interface org.apache.spark.util.random.RandomSampler
-
take a random sample
- sampleByKey(boolean, Map<K, Object>, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a subset of this RDD sampled by key (via stratified sampling).
- sampleByKey(boolean, Map<K, Object>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a subset of this RDD sampled by key (via stratified sampling).
- sampleByKey(boolean, Map<K, Object>, long) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return a subset of this RDD sampled by key (via stratified sampling).
- sampleByKeyExact(boolean, Map<K, Object>, long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
::Experimental::
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
- sampleByKeyExact(boolean, Map<K, Object>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
::Experimental::
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
- sampleByKeyExact(boolean, Map<K, Object>, long) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
::Experimental::
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly
math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
- sampleStdev() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the sample standard deviation of this RDD's elements (which corrects for bias in
estimating the standard deviation by dividing by N-1 instead of N).
- sampleStdev() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the sample standard deviation of this RDD's elements (which corrects for bias in
estimating the standard deviation by dividing by N-1 instead of N).
- sampleStdev() - Method in class org.apache.spark.util.StatCounter
-
Return the sample standard deviation of the values, which corrects for bias in estimating the
variance by dividing by N-1 instead of N.
- sampleVariance() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Compute the sample variance of this RDD's elements (which corrects for bias in
estimating the standard variance by dividing by N-1 instead of N).
- sampleVariance() - Method in class org.apache.spark.rdd.DoubleRDDFunctions
-
Compute the sample variance of this RDD's elements (which corrects for bias in
estimating the variance by dividing by N-1 instead of N).
- sampleVariance() - Method in class org.apache.spark.util.StatCounter
-
Return the sample variance, which corrects for bias in estimating the variance by dividing
by N-1 instead of N.
- saveAsHadoopDataset(JobConf) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for
that storage system.
- saveAsHadoopDataset(JobConf) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for
that storage system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>, JobConf) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<F>, Class<? extends CompressionCodec>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system, compressing with the supplied codec.
- saveAsHadoopFile(String, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<? extends CompressionCodec>, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Class<? extends CompressionCodec>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf, Option<Class<? extends CompressionCodec>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat
class
supporting the key and value types K and V in this RDD.
- saveAsHadoopFiles(String, String) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, ClassTag<F>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, JobConf) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsHiveFile(RDD<Writable>, Class<?>, FileSinkDesc, JobConf, boolean) - Method in class org.apache.spark.sql.hive.execution.InsertIntoHiveTable
-
- saveAsLibSVMFile(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Save labeled data in LIBSVM format.
- saveAsNewAPIHadoopDataset(Configuration) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported storage system, using
a Configuration object for that storage system.
- saveAsNewAPIHadoopDataset(Configuration) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported storage system with new Hadoop API, using a Hadoop
Configuration object for that storage system.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<F>, Configuration) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<F>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Output the RDD to any Hadoop-supported file system.
- saveAsNewAPIHadoopFile(String, ClassTag<F>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a new Hadoop API OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
- saveAsNewAPIHadoopFile(String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Output the RDD to any Hadoop-supported file system, using a new Hadoop API OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
- saveAsNewAPIHadoopFiles(String, String) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, ClassTag<F>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsNewAPIHadoopFiles(String, String, Class<?>, Class<?>, Class<? extends OutputFormat<?, ?>>, Configuration) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Save each RDD in this
DStream as a Hadoop file.
- saveAsObjectFile(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a SequenceFile of serialized objects.
- saveAsObjectFile(String) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a SequenceFile of serialized objects.
- saveAsObjectFiles(String, String) - Method in class org.apache.spark.streaming.dstream.DStream
-
Save each RDD in this DStream as a Sequence file of serialized objects.
- saveAsSequenceFile(String, Option<Class<? extends CompressionCodec>>) - Method in class org.apache.spark.rdd.SequenceFileRDDFunctions
-
Output the RDD as a Hadoop SequenceFile using the Writable types we infer from the RDD's key
and value types.
- saveAsTextFile(String) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a text file, using string representations of elements.
- saveAsTextFile(String, Class<? extends CompressionCodec>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Save this RDD as a compressed text file, using string representations of elements.
- saveAsTextFile(String) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a text file, using string representations of elements.
- saveAsTextFile(String, Class<? extends CompressionCodec>) - Method in class org.apache.spark.rdd.RDD
-
Save this RDD as a compressed text file, using string representations of elements.
- saveAsTextFiles(String, String) - Method in class org.apache.spark.streaming.dstream.DStream
-
Save each RDD in this DStream as at text file, using string representation
of elements.
- saveLabeledData(RDD<LabeledPoint>, String) - Static method in class org.apache.spark.mllib.util.MLUtils
-
- sc() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- sc() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- sc() - Method in class org.apache.spark.streaming.StreamingContext
-
- scalaIntToJavaLong(DStream<Object>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- scalaToJavaLong(JavaPairDStream<K, Object>, ClassTag<K>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- scheduler() - Method in class org.apache.spark.streaming.StreamingContext
-
- schedulingDelay() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
-
Time taken for the first job of this batch to start processing from the time this batch
was submitted to the streaming scheduler.
- SchedulingMode - Class in org.apache.spark.scheduler
-
"FAIR" and "FIFO" determines which policy is used
to order tasks amongst a Schedulable's sub-queues
"NONE" is used when the a Schedulable has no sub-queues.
- SchedulingMode() - Constructor for class org.apache.spark.scheduler.SchedulingMode
-
- schedulingMode() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- schema() - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Returns the schema of this JavaSchemaRDD (represented by a StructType).
- schema() - Method in class org.apache.spark.sql.execution.AggregateEvaluation
-
- schema() - Method in class org.apache.spark.sql.SchemaRDD
-
Returns the schema of this SchemaRDD (represented by a StructType
).
- SchemaRDD - Class in org.apache.spark.sql
-
:: AlphaComponent ::
An RDD of Row
objects that has an associated schema.
- SchemaRDD(SQLContext, LogicalPlan) - Constructor for class org.apache.spark.sql.SchemaRDD
-
- script() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- ScriptTransformation - Class in org.apache.spark.sql.hive.execution
-
:: DeveloperApi ::
Transforms the input by forking and running the specified script.
- ScriptTransformation(Seq<Expression>, String, Seq<Attribute>, SparkPlan, HiveContext) - Constructor for class org.apache.spark.sql.hive.execution.ScriptTransformation
-
- seconds() - Static method in class org.apache.spark.scheduler.StatsReportListener
-
- Seconds - Class in org.apache.spark.streaming
-
Helper object that creates instance of
Duration
representing
a given number of seconds.
- Seconds() - Constructor for class org.apache.spark.streaming.Seconds
-
- securityManager() - Method in class org.apache.spark.SparkEnv
-
- seed() - Method in class org.apache.spark.sql.execution.Sample
-
- select(Seq<Expression>) - Method in class org.apache.spark.sql.SchemaRDD
-
Changes the output of this relation to the given expressions, similar to the SELECT
clause
in SQL.
- sequenceFile(String, Class<K>, Class<V>, int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, Class<K>, Class<V>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get an RDD for a Hadoop SequenceFile.
- sequenceFile(String, Class<K>, Class<V>, int) - Method in class org.apache.spark.SparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, Class<K>, Class<V>) - Method in class org.apache.spark.SparkContext
-
Get an RDD for a Hadoop SequenceFile with given key and value types.
- sequenceFile(String, int, ClassTag<K>, ClassTag<V>, Function0<WritableConverter<K>>, Function0<WritableConverter<V>>) - Method in class org.apache.spark.SparkContext
-
Version of sequenceFile() for types implicitly convertible to Writables through a
WritableConverter.
- SequenceFileRDDFunctions<K,V> - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile,
through an implicit conversion.
- SequenceFileRDDFunctions(RDD<Tuple2<K, V>>, Function1<K, Writable>, ClassTag<K>, Function1<V, Writable>, ClassTag<V>) - Constructor for class org.apache.spark.rdd.SequenceFileRDDFunctions
-
- SerializableWritable<T extends org.apache.hadoop.io.Writable> - Class in org.apache.spark
-
- SerializableWritable(T) - Constructor for class org.apache.spark.SerializableWritable
-
- SerializationStream - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
A stream for writing serialized objects.
- SerializationStream() - Constructor for class org.apache.spark.serializer.SerializationStream
-
- serialize(T, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
-
- serialize(Object, ObjectInspector) - Method in class org.apache.spark.sql.hive.parquet.FakeParquetSerDe
-
- Serializer - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
A serializer.
- Serializer() - Constructor for class org.apache.spark.serializer.Serializer
-
- serializer() - Method in class org.apache.spark.ShuffleDependency
-
- serializer() - Method in class org.apache.spark.SparkEnv
-
- SerializerInstance - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
An instance of a serializer, for use by one thread at a time.
- SerializerInstance() - Constructor for class org.apache.spark.serializer.SerializerInstance
-
- serializeStream(OutputStream) - Method in class org.apache.spark.serializer.SerializerInstance
-
- set(String, String) - Method in class org.apache.spark.SparkConf
-
Set a configuration variable.
- set(SparkEnv) - Static method in class org.apache.spark.SparkEnv
-
- setAggregator(Aggregator<K, V, C>) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set aggregator for RDD's shuffle.
- setAll(Traversable<Tuple2<String, String>>) - Method in class org.apache.spark.SparkConf
-
Set multiple parameters together
- setAlpha(double) - Method in class org.apache.spark.mllib.recommendation.ALS
-
:: Experimental ::
Sets the constant used in computing confidence in implicit ALS.
- setAppName(String) - Method in class org.apache.spark.SparkConf
-
Set a name for your application.
- setBlocks(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of blocks for both user blocks and product blocks to parallelize the computation
into; pass -1 for an auto-configured number of blocks.
- setCallSite(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- setCallSite(String) - Method in class org.apache.spark.SparkContext
-
Support function for API backtraces.
- setCheckpointDir(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Set the directory under which RDDs are going to be checkpointed.
- setCheckpointDir(String) - Method in class org.apache.spark.SparkContext
-
Set the directory under which RDDs are going to be checkpointed.
- SetCommand - Class in org.apache.spark.sql.execution
-
:: DeveloperApi ::
- SetCommand(Option<String>, Option<String>, Seq<Attribute>, SQLContext) - Constructor for class org.apache.spark.sql.execution.SetCommand
-
- setConf(String, String) - Method in class org.apache.spark.sql.hive.HiveContext
-
- setConvergenceTol(double) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the convergence tolerance of iterations for L-BFGS.
- setDefaultClassLoader(ClassLoader) - Method in class org.apache.spark.serializer.Serializer
-
Sets a class loader for the serializer to use in deserialization.
- setEpsilon(double) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the distance threshold within which we've consider centers to have converged.
- setExecutorEnv(String, String) - Method in class org.apache.spark.SparkConf
-
Set an environment variable to be used when launching executors for this application.
- setExecutorEnv(Seq<Tuple2<String, String>>) - Method in class org.apache.spark.SparkConf
-
Set multiple environment variables to be used when launching executors.
- setExecutorEnv(Tuple2<String, String>[]) - Method in class org.apache.spark.SparkConf
-
Set multiple environment variables to be used when launching executors.
- setGradient(Gradient) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the gradient function (of the loss function of one single data example)
to be used for SGD.
- setGradient(Gradient) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the gradient function (of the loss function of one single data example)
to be used for L-BFGS.
- setIfMissing(String, String) - Method in class org.apache.spark.SparkConf
-
Set a parameter if it isn't already configured
- setImplicitPrefs(boolean) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets whether to use implicit preference.
- setInitializationMode(String) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the initialization algorithm.
- setInitializationSteps(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the number of steps for the k-means|| initialization mode.
- setInitialWeights(Vector) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the initial weights.
- setIntercept(boolean) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Set if the algorithm should add an intercept.
- setIntermediateRDDStorageLevel(StorageLevel) - Method in class org.apache.spark.mllib.recommendation.ALS
-
:: DeveloperApi ::
Sets storage level for intermediate RDDs (user/product in/out links).
- setIterations(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of iterations to run.
- setJars(Seq<String>) - Method in class org.apache.spark.SparkConf
-
Set JAR files to distribute to the cluster.
- setJars(String[]) - Method in class org.apache.spark.SparkConf
-
Set JAR files to distribute to the cluster.
- setJobDescription(String) - Method in class org.apache.spark.SparkContext
-
Set a human readable description of the current job.
- setJobGroup(String, String, boolean) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setJobGroup(String, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setJobGroup(String, String, boolean) - Method in class org.apache.spark.SparkContext
-
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
- setK(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set the number of clusters to create (k).
- setKeyOrdering(Ordering<K>) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set key ordering for RDD's shuffle.
- setLambda(double) - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
Set the smoothing parameter.
- setLambda(double) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the regularization parameter, lambda.
- setLearningRate(double) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets initial learning rate (default: 0.025).
- setLocalProperty(String, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
- setLocalProperty(String, String) - Method in class org.apache.spark.SparkContext
-
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
- setMapSideCombine(boolean) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set mapSideCombine flag for RDD's shuffle.
- setMaster(String) - Method in class org.apache.spark.SparkConf
-
The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to
run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
- setMaxIterations(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
Set maximum number of iterations to run.
- setMaxNumIterations(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
:: Experimental ::
Set fraction of data to be used for each SGD iteration.
- setMiniBatchFraction(double) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the fraction of each batch to use for updates.
- setName(String) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.api.java.JavaRDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.rdd.RDD
-
Assign a name to this RDD
- setName(String) - Method in class org.apache.spark.sql.api.java.JavaSchemaRDD
-
Assign a name to this RDD
- setNonnegative(boolean) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set whether the least-squares problems solved at each iteration should have
nonnegativity constraints.
- setNumCorrections(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the number of corrections used in the LBFGS update.
- setNumIterations(int) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets number of iterations (default: 1), which should be smaller than or equal to number of
partitions.
- setNumIterations(int) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the number of iterations for SGD.
- setNumIterations(int) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the maximal number of iterations for L-BFGS.
- setNumIterations(int) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the number of iterations of gradient descent to run per update.
- setNumPartitions(int) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets number of partitions (default: 1).
- setProductBlocks(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the number of product blocks to parallelize the computation.
- setRank(int) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Set the rank of the feature matrices computed (number of features).
- setRegParam(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the regularization parameter.
- setRegParam(double) - Method in class org.apache.spark.mllib.optimization.LBFGS
-
Set the regularization parameter.
- setRuns(int) - Method in class org.apache.spark.mllib.clustering.KMeans
-
:: Experimental ::
Set the number of runs of the algorithm to execute in parallel.
- setSeed(long) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
Sets random seed (default: a random long integer).
- setSeed(long) - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.random.UniformGenerator
-
- setSeed(long) - Method in class org.apache.spark.mllib.recommendation.ALS
-
Sets a random seed to have deterministic results.
- setSeed(long) - Method in class org.apache.spark.util.random.BernoulliSampler
-
- setSeed(long) - Method in class org.apache.spark.util.random.PoissonSampler
-
- setSeed(long) - Method in interface org.apache.spark.util.random.Pseudorandom
-
Set random seed.
- setSerializer(Serializer) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
- setSerializer(Serializer) - Method in class org.apache.spark.rdd.ShuffledRDD
-
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
- setSparkHome(String) - Method in class org.apache.spark.SparkConf
-
Set the location where Spark is installed on worker nodes.
- setStepSize(double) - Method in class org.apache.spark.mllib.optimization.GradientDescent
-
Set the initial step size of SGD for the first step.
- setStepSize(double) - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
Set the step size for gradient descent.
- setThreshold(double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
:: Experimental ::
Sets the threshold that separates positive predictions from negative predictions.
- setThreshold(double) - Method in class org.apache.spark.mllib.classification.SVMModel
-
:: Experimental ::
Sets the threshold that separates positive predictions from negative predictions.
- settings() - Method in class org.apache.spark.SparkConf
-