- abs(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the absolute value.
- abs() - Method in class org.apache.spark.sql.types.Decimal
-
- AbsoluteError - Class in org.apache.spark.mllib.tree.loss
-
:: DeveloperApi ::
Class for absolute error loss calculation (for regression).
- AbsoluteError() - Constructor for class org.apache.spark.mllib.tree.loss.AbsoluteError
-
- accessTime() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
-
- accId() - Method in class org.apache.spark.CleanAccum
-
- Accumulable<R,T> - Class in org.apache.spark
-
A data type that can be accumulated, ie has an commutative and associative "add" operation,
but where the result type, R
, may be different from the element type being added, T
.
- Accumulable(R, AccumulableParam<R, T>, Option<String>) - Constructor for class org.apache.spark.Accumulable
-
- Accumulable(R, AccumulableParam<R, T>) - Constructor for class org.apache.spark.Accumulable
-
- accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(T, String, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulable
shared variable of the given type, to which tasks
can "add" values with
add
.
- accumulable(R, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, to which tasks can add values
with
+=
.
- accumulable(R, String, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulable
shared variable, with a name for display in the
Spark UI.
- accumulableCollection(R, Function1<R, Growable<T>>, ClassTag<R>) - Method in class org.apache.spark.SparkContext
-
Create an accumulator from a "mutable collection" type.
- AccumulableInfo - Class in org.apache.spark.scheduler
-
:: DeveloperApi ::
Information about an
Accumulable
modified during a task or stage.
- AccumulableInfo - Class in org.apache.spark.status.api.v1
-
- AccumulableParam<R,T> - Interface in org.apache.spark
-
Helper object defining how to accumulate values of a particular type.
- accumulables() - Method in class org.apache.spark.scheduler.StageInfo
-
Terminal values of accumulables updated during this stage.
- accumulables() - Method in class org.apache.spark.scheduler.TaskInfo
-
Intermediate updates to accumulables during this task.
- Accumulator<T> - Class in org.apache.spark
-
A simpler value of
Accumulable
where the result type being accumulated is the same
as the types of elements being merged, i.e.
- Accumulator(T, AccumulatorParam<T>, Option<String>) - Constructor for class org.apache.spark.Accumulator
-
- Accumulator(T, AccumulatorParam<T>) - Constructor for class org.apache.spark.Accumulator
-
- accumulator(int) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(int, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
integer variable, which tasks can "add" values
to using the
add
method.
- accumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
add
method.
- accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, which tasks can "add"
values to using the
+=
method.
- accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
-
Create an
Accumulator
variable of a given type, with a name for display
in the Spark UI.
- AccumulatorParam<T> - Interface in org.apache.spark
-
A simpler version of
AccumulableParam
where the only data type you can add
in is the same type as the accumulated value.
- AccumulatorParam.DoubleAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.DoubleAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
-
- AccumulatorParam.FloatAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.FloatAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
-
- AccumulatorParam.IntAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.IntAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
-
- AccumulatorParam.LongAccumulatorParam$ - Class in org.apache.spark
-
- AccumulatorParam.LongAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
-
- accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.StageData
-
- accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.TaskData
-
- accuracy() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns accuracy
- acos(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine inverse of the given value; the returned angle is in the range
0.0 through pi.
- acos(String) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine inverse of the given column; the returned angle is in the range
0.0 through pi.
- active() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
-
- activeJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- activeStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- activeTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- ActorHelper - Interface in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A receiver trait to be mixed in with your Actor to gain access to
the API for pushing received data into Spark Streaming for being processed.
- actorStream(Props, String, StorageLevel, SupervisorStrategy) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- actorStream(Props, String, StorageLevel, SupervisorStrategy, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create an input stream with any arbitrary user implemented actor receiver.
- ActorSupervisorStrategy - Class in org.apache.spark.streaming.receiver
-
:: DeveloperApi ::
A helper with set of defaults for supervisor strategy
- ActorSupervisorStrategy() - Constructor for class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
-
- actorSystem() - Method in class org.apache.spark.SparkEnv
-
- add(T) - Method in class org.apache.spark.Accumulable
-
Add more data to this accumulator / accumulable
- add(org.apache.spark.ml.feature.Instance) - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
Add a new training instance to this LogisticAggregator, and update the loss and gradient
of the objective function.
- add(AFTPoint) - Method in class org.apache.spark.ml.regression.AFTAggregator
-
- add(org.apache.spark.ml.feature.Instance) - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
Add a new training instance to this LeastSquaresAggregator, and update the loss and gradient
of the objective function.
- add(double[], MultivariateGaussian[], ExpectationSum, Vector<Object>) - Static method in class org.apache.spark.mllib.clustering.ExpectationSum
-
- add(Vector) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
-
Adds a new document.
- add(BlockMatrix) - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Adds two block matrices together.
- add(Vector) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Add a new sample to this summarizer, and update the statistical summary.
- add(StructField) - Method in class org.apache.spark.sql.types.StructType
-
- add(String, DataType) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new nullable field with no metadata.
- add(String, DataType, boolean) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field with no metadata.
- add(String, DataType, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field and specifying metadata.
- add(String, String) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new nullable field with no metadata where the
dataType is specified as a String.
- add(String, String, boolean) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field with no metadata where the
dataType is specified as a String.
- add(String, String, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
-
Creates a new
StructType
by adding a new field and specifying metadata where the
dataType is specified as a String.
- add(Vector) - Method in class org.apache.spark.util.Vector
-
- add_months(Column, int) - Static method in class org.apache.spark.sql.functions
-
Returns the date that is numMonths after startDate.
- addAccumulator(R, T) - Method in interface org.apache.spark.AccumulableParam
-
Add additional data to the accumulator value.
- addAccumulator(T, T) - Method in interface org.apache.spark.AccumulatorParam
-
- addAppArgs(String...) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds command line arguments for the application.
- addedFiles() - Method in class org.apache.spark.SparkContext
-
- addedJars() - Method in class org.apache.spark.SparkContext
-
- addFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a file to be submitted with the application.
- addFile(String) - Method in class org.apache.spark.SparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addFile(String, boolean) - Method in class org.apache.spark.SparkContext
-
Add a file to be downloaded with this Spark job on every node.
- addGrid(Param<T>, Iterable<T>) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a param with multiple values (overwrites if the input param exists).
- addGrid(DoubleParam, double[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a double param with multiple values.
- addGrid(IntParam, int[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a int param with multiple values.
- addGrid(FloatParam, float[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a float param with multiple values.
- addGrid(LongParam, long[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a long param with multiple values.
- addGrid(BooleanParam) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
-
Adds a boolean param with true and false.
- addInPlace(R, R) - Method in interface org.apache.spark.AccumulableParam
-
Merge two accumulated values together.
- addInPlace(double, double) - Method in class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
-
- addInPlace(float, float) - Method in class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
-
- addInPlace(int, int) - Method in class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
-
- addInPlace(long, long) - Method in class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
-
- addInPlace(double, double) - Method in class org.apache.spark.SparkContext.DoubleAccumulatorParam$
-
- addInPlace(float, float) - Method in class org.apache.spark.SparkContext.FloatAccumulatorParam$
-
- addInPlace(int, int) - Method in class org.apache.spark.SparkContext.IntAccumulatorParam$
-
- addInPlace(long, long) - Method in class org.apache.spark.SparkContext.LongAccumulatorParam$
-
- addInPlace(Vector) - Method in class org.apache.spark.util.Vector
-
- addInPlace(Vector, Vector) - Method in class org.apache.spark.util.Vector.VectorAccumParam$
-
- addIntercept() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Whether to add intercept (default: false).
- addJar(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addJar(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a jar file to be submitted with the application.
- addJar(String) - Method in class org.apache.spark.SparkContext
-
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
- addJar(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
- addJar(String) - Method in class org.apache.spark.sql.SQLContext
-
Add a jar to SQLContext
- addListener(SparkAppHandle.Listener) - Method in interface org.apache.spark.launcher.SparkAppHandle
-
Adds a listener to be notified of changes to the handle's information.
- addLocalConfiguration(String, int, int, int, JobConf) - Static method in class org.apache.spark.rdd.HadoopRDD
-
Add Hadoop configuration specific to a single partition and attempt.
- addOnCompleteCallback(Function0<BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Adds a callback function to be executed on task completion.
- addPartToPGroup(Partition, PartitionGroup) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- addPyFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a python file / zip / egg to be submitted with the application.
- address() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
-
- addSparkArg(String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds a no-value argument to the Spark invocation.
- addSparkArg(String, String) - Method in class org.apache.spark.launcher.SparkLauncher
-
Adds an argument with a value to the Spark invocation.
- addSparkListener(SparkListener) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Register a listener to receive up-calls from events that happen during execution.
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.StreamingContext
-
- addTaskCompletionListener(TaskCompletionListener) - Method in class org.apache.spark.TaskContext
-
Adds a (Java friendly) listener to be executed on task completion.
- addTaskCompletionListener(Function1<TaskContext, BoxedUnit>) - Method in class org.apache.spark.TaskContext
-
Adds a listener in the form of a Scala closure to be executed on task completion.
- AFTAggregator - Class in org.apache.spark.ml.regression
-
- AFTAggregator(DenseVector<Object>, boolean) - Constructor for class org.apache.spark.ml.regression.AFTAggregator
-
- AFTCostFun - Class in org.apache.spark.ml.regression
-
- AFTCostFun(RDD<AFTPoint>, boolean) - Constructor for class org.apache.spark.ml.regression.AFTCostFun
-
- AFTSurvivalRegression - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Fit a parametric survival regression model named accelerated failure time (AFT) model
(https://en.wikipedia.org/wiki/Accelerated_failure_time_model
)
based on the Weibull distribution of the survival time.
- AFTSurvivalRegression(String) - Constructor for class org.apache.spark.ml.regression.AFTSurvivalRegression
-
- AFTSurvivalRegression() - Constructor for class org.apache.spark.ml.regression.AFTSurvivalRegression
-
- AFTSurvivalRegressionModel - Class in org.apache.spark.ml.regression
-
- agg(Column, Column...) - Method in class org.apache.spark.sql.DataFrame
-
Aggregates on the entire
DataFrame
without groups.
- agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Aggregates on the entire
DataFrame
without groups.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Aggregates on the entire
DataFrame
without groups.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
-
(Java-specific) Aggregates on the entire
DataFrame
without groups.
- agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Aggregates on the entire
DataFrame
without groups.
- agg(Column, Column...) - Method in class org.apache.spark.sql.GroupedData
-
Compute aggregates by specifying a series of aggregate columns.
- agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.GroupedData
-
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.GroupedData
-
(Scala-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Map<String, String>) - Method in class org.apache.spark.sql.GroupedData
-
(Java-specific) Compute aggregates by specifying a map from column name to
aggregate methods.
- agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.GroupedData
-
Compute aggregates by specifying a series of aggregate columns.
- agg(TypedColumn<V, U1>) - Method in class org.apache.spark.sql.GroupedDataset
-
Computes the given aggregation, returning a
Dataset
of tuples for each unique key
and the result of computing this aggregation over all elements in the group.
- agg(TypedColumn<V, U1>, TypedColumn<V, U2>) - Method in class org.apache.spark.sql.GroupedDataset
-
Computes the given aggregations, returning a
Dataset
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
- agg(TypedColumn<V, U1>, TypedColumn<V, U2>, TypedColumn<V, U3>) - Method in class org.apache.spark.sql.GroupedDataset
-
Computes the given aggregations, returning a
Dataset
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
- agg(TypedColumn<V, U1>, TypedColumn<V, U2>, TypedColumn<V, U3>, TypedColumn<V, U4>) - Method in class org.apache.spark.sql.GroupedDataset
-
Computes the given aggregations, returning a
Dataset
of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregate(U, Function2<U, T, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Aggregate the values of each key, using given combine functions and a neutral "zero value".
- AggregatedDialect - Class in org.apache.spark.sql.jdbc
-
AggregatedDialect can unify multiple dialects into one virtual Dialect.
- AggregatedDialect(List<JdbcDialect>) - Constructor for class org.apache.spark.sql.jdbc.AggregatedDialect
-
- aggregateMessages(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, ClassTag<A>) - Method in class org.apache.spark.graphx.Graph
-
Aggregates values from the neighboring edges and vertices of each vertex.
- aggregateMessagesWithActiveSet(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, Option<Tuple2<VertexRDD<?>, EdgeDirection>>, ClassTag<A>) - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.VertexRDD
-
Aggregates vertices in messages
that have the same ids using reduceFunc
, returning a
VertexRDD co-indexed with this
.
- AggregatingEdgeContext<VD,ED,A> - Class in org.apache.spark.graphx.impl
-
- AggregatingEdgeContext(Function2<A, A, A>, Object, BitSet) - Constructor for class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- Aggregator<K,V,C> - Class in org.apache.spark
-
:: DeveloperApi ::
A set of functions used to aggregate data.
- Aggregator(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Constructor for class org.apache.spark.Aggregator
-
- aggregator() - Method in class org.apache.spark.ShuffleDependency
-
- Aggregator<I,B,O> - Class in org.apache.spark.sql.expressions
-
A base class for user-defined aggregations, which can be used in DataFrame
and Dataset
operations to take all of the elements of a group and reduce them to a single value.
- Aggregator() - Constructor for class org.apache.spark.sql.expressions.Aggregator
-
- aggUntyped(Seq<TypedColumn<?, ?>>) - Method in class org.apache.spark.sql.GroupedDataset
-
Internal helper function for building typed aggregations that return tuples.
- Algo - Class in org.apache.spark.mllib.tree.configuration
-
:: Experimental ::
Enum to select the algorithm for the decision tree
- Algo() - Constructor for class org.apache.spark.mllib.tree.configuration.Algo
-
- algo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- algo() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- algo() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- algo() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- algorithm() - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
-
- algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
-
The algorithm to use for updating.
- algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
-
- alias(String) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias.
- alias(String) - Method in class org.apache.spark.sql.DataFrame
-
- alias(Symbol) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Returns a new
DataFrame
with an alias set.
- All - Static variable in class org.apache.spark.graphx.TripletFields
-
Expose all the fields (source, edge, and destination).
- alpha() - Method in class org.apache.spark.mllib.random.WeibullGenerator
-
- AlphaComponent - Annotation Type in org.apache.spark.annotation
-
A new component of Spark which may have unstable API's.
- ALS - Class in org.apache.spark.ml.recommendation
-
:: Experimental ::
Alternating Least Squares (ALS) matrix factorization.
- ALS(String) - Constructor for class org.apache.spark.ml.recommendation.ALS
-
- ALS() - Constructor for class org.apache.spark.ml.recommendation.ALS
-
- ALS - Class in org.apache.spark.mllib.recommendation
-
- ALS() - Constructor for class org.apache.spark.mllib.recommendation.ALS
-
- ALS.Rating<ID> - Class in org.apache.spark.ml.recommendation
-
:: DeveloperApi ::
Rating class for better code readability.
- ALS.Rating(ID, ID, float) - Constructor for class org.apache.spark.ml.recommendation.ALS.Rating
-
- ALS.Rating$ - Class in org.apache.spark.ml.recommendation
-
- ALS.Rating$() - Constructor for class org.apache.spark.ml.recommendation.ALS.Rating$
-
- ALSModel - Class in org.apache.spark.ml.recommendation
-
:: Experimental ::
Model fitted by ALS.
- AnalysisException - Exception in org.apache.spark.sql
-
:: DeveloperApi ::
Thrown when a query fails to analyze, usually because the query itself is invalid.
- AnalysisException(String, Option<Object>, Option<Object>) - Constructor for exception org.apache.spark.sql.AnalysisException
-
- analyze(String) - Method in class org.apache.spark.sql.hive.HiveContext
-
Analyzes the given table in the current database to generate statistics, which will be
used in query optimizations.
- analyzer() - Method in class org.apache.spark.sql.hive.HiveContext
-
- analyzer() - Method in class org.apache.spark.sql.SQLContext
-
- and(Column) - Method in class org.apache.spark.sql.Column
-
Boolean AND.
- And - Class in org.apache.spark.sql.sources
-
A filter that evaluates to true
iff both left
or right
evaluate to true
.
- And(Filter, Filter) - Constructor for class org.apache.spark.sql.sources.And
-
- antecedent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
-
- ANY() - Static method in class org.apache.spark.scheduler.TaskLocality
-
- anyNull() - Method in interface org.apache.spark.sql.Row
-
Returns true if there are any NULL values in this row.
- appAttemptId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- appendBias(Vector) - Static method in class org.apache.spark.mllib.util.MLUtils
-
Returns a new vector with 1.0
(bias) appended to the input vector.
- appId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- applicationAttemptId() - Method in class org.apache.spark.SparkContext
-
- ApplicationAttemptInfo - Class in org.apache.spark.status.api.v1
-
- applicationId() - Method in class org.apache.spark.SparkContext
-
A unique identifier for the Spark application.
- ApplicationInfo - Class in org.apache.spark.status.api.v1
-
- ApplicationStatus - Enum in org.apache.spark.status.api.v1
-
- apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
-
Construct a graph from a collection of vertices and
edges with attributes.
- apply(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from edges, setting referenced vertices to `defaultVertexAttr`.
- apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from vertices and edges, setting missing vertices to `defaultVertexAttr`.
- apply(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices.
- apply(Graph<VD, ED>, A, int, EdgeDirection, Function3<Object, VD, A, VD>, Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, ClassTag<VD>, ClassTag<ED>, ClassTag<A>) - Static method in class org.apache.spark.graphx.Pregel
-
Execute a Pregel-like iterative vertex-parallel abstraction.
- apply(RDD<Tuple2<Object, VD>>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a standalone
VertexRDD
(one that is not set up for efficient joins with an
EdgeRDD
) from an RDD of vertex-attribute pairs.
- apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a VertexRDD
from an RDD of vertex-attribute pairs.
- apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, Function2<VD, VD, VD>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a VertexRDD
from an RDD of vertex-attribute pairs.
- apply(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its name.
- apply(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its index.
- apply(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
-
Gets the value of the input param or its default value if it does not exist.
- apply(int, int) - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
- apply(int) - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- apply(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Gets the (i, j)-th element.
- apply(int, int) - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- apply(int) - Method in interface org.apache.spark.mllib.linalg.Vector
-
Gets the value of the ith element.
- apply(int, Predict, double, boolean) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Construct a node with nodeIndex, predict, impurity and isLeaf parameters.
- apply(String) - Static method in class org.apache.spark.rdd.PartitionGroup
-
- apply(long, String, Option<String>, String, boolean) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(long, String, Option<String>, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(long, String, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
-
- apply(long, TaskMetrics) - Static method in class org.apache.spark.scheduler.RuntimePercentage
-
- apply(Object) - Method in class org.apache.spark.sql.Column
-
Extracts a value or values from a complex type.
- apply(String) - Method in class org.apache.spark.sql.DataFrame
-
Selects column based on the column name and return it as a
Column
.
- apply(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using given Column
s as input arguments.
- apply(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using given Column
s as input arguments.
- apply(DataFrame, Seq<Expression>, GroupedData.GroupType) - Static method in class org.apache.spark.sql.GroupedData
-
- apply(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i.
- apply(DataType) - Static method in class org.apache.spark.sql.types.ArrayType
-
Construct a
ArrayType
object with the given element type.
- apply(double) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(long) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(long, int, int) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply(String) - Static method in class org.apache.spark.sql.types.Decimal
-
- apply() - Static method in class org.apache.spark.sql.types.DecimalType
-
- apply(Option<PrecisionInfo>) - Static method in class org.apache.spark.sql.types.DecimalType
-
- apply(DataType, DataType) - Static method in class org.apache.spark.sql.types.MapType
-
Construct a
MapType
object with the given key type and value type.
- apply(String) - Method in class org.apache.spark.sql.types.StructType
-
- apply(Set<String>) - Method in class org.apache.spark.sql.types.StructType
-
Returns a
StructType
containing
StructField
s of the given names, preserving the
original order of fields.
- apply(int) - Method in class org.apache.spark.sql.types.StructType
-
- apply(Seq<Column>) - Method in class org.apache.spark.sql.UserDefinedFunction
-
- apply(String) - Static method in class org.apache.spark.storage.BlockId
-
Converts a BlockId "name" String back into a BlockId.
- apply(String, String, int) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(ObjectInput) - Static method in class org.apache.spark.storage.BlockManagerId
-
- apply(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object without setting useOffHeap.
- apply(boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object.
- apply(int, int) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Create a new StorageLevel object from its integer representation.
- apply(ObjectInput) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Read StorageLevel object from ObjectInput stream.
- apply(String, int) - Static method in class org.apache.spark.streaming.kafka.Broker
-
- apply(String, int, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- apply(TopicAndPartition, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- apply(long) - Static method in class org.apache.spark.streaming.Milliseconds
-
- apply(long) - Static method in class org.apache.spark.streaming.Minutes
-
- apply(long) - Static method in class org.apache.spark.streaming.Seconds
-
- apply(TraversableOnce<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values.
- apply(Seq<Object>) - Static method in class org.apache.spark.util.StatCounter
-
Build a StatCounter from a list of values passed as variable-length arguments.
- apply(int) - Method in class org.apache.spark.util.Vector
-
- applySchema(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- applySchema(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- applySchema(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- applySchema(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- applySchemaToPythonRDD(RDD<Object[]>, String) - Method in class org.apache.spark.sql.SQLContext
-
- applySchemaToPythonRDD(RDD<Object[]>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- appName() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- appName() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- appName() - Method in class org.apache.spark.SparkContext
-
- approxCountDistinct(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(Column, double) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- approxCountDistinct(String, double) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the approximate number of distinct items in a group.
- ApproxHist() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
-
- areaUnderPR() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the precision-recall curve.
- areaUnderROC() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Computes the area under the receiver operating characteristic (ROC) curve.
- areaUnderROC() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Computes the area under the receiver operating characteristic (ROC) curve.
- argmax() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- argmax() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- argmax() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Find the index of a maximal element.
- arr() - Method in class org.apache.spark.rdd.PartitionGroup
-
- array(DataType) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type array.
- array(Column...) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array(String, String...) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Creates a new array column.
- array_contains(Column, Object) - Static method in class org.apache.spark.sql.functions
-
Returns true if the array contain the value
- arrayLengthGt(double) - Static method in class org.apache.spark.ml.param.ParamValidators
-
Check that the array length is greater than lowerBound.
- ArrayType - Class in org.apache.spark.sql.types
-
- ArrayType(DataType, boolean) - Constructor for class org.apache.spark.sql.types.ArrayType
-
- ArrayType() - Constructor for class org.apache.spark.sql.types.ArrayType
-
No-arg constructor for kryo.
- as(Encoder<U>) - Method in class org.apache.spark.sql.Column
-
Provides a type hint about the expected return value of this column.
- as(String) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias.
- as(Seq<String>) - Method in class org.apache.spark.sql.Column
-
(Scala-specific) Assigns the given aliases to the results of a table generating function.
- as(String[]) - Method in class org.apache.spark.sql.Column
-
Assigns the given aliases to the results of a table generating function.
- as(Symbol) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias.
- as(String, Metadata) - Method in class org.apache.spark.sql.Column
-
Gives the column an alias with metadata.
- as(Encoder<U>) - Method in class org.apache.spark.sql.DataFrame
-
:: Experimental ::
Converts this
DataFrame
to a strongly-typed
Dataset
containing objects of the
specified type,
U
.
- as(String) - Method in class org.apache.spark.sql.DataFrame
-
- as(Symbol) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Returns a new
DataFrame
with an alias set.
- as(Encoder<U>) - Method in class org.apache.spark.sql.Dataset
-
Returns a new
Dataset
where each record has been mapped on to the specified type.
- as(String) - Method in class org.apache.spark.sql.Dataset
-
Applies a logical alias to this
Dataset
that can be used to disambiguate columns that have
the same name after two Datasets have been joined.
- asc() - Method in class org.apache.spark.sql.Column
-
Returns an ordering used in sorting.
- asc(String) - Static method in class org.apache.spark.sql.functions
-
Returns a sort expression based on ascending order of the column.
- ascii(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the numeric value of the first character of the string column, and returns the
result as a int column.
- asin(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the sine inverse of the given value; the returned angle is in the range
-pi/2 through pi/2.
- asin(String) - Static method in class org.apache.spark.sql.functions
-
Computes the sine inverse of the given column; the returned angle is in the range
-pi/2 through pi/2.
- asIntegral() - Method in class org.apache.spark.sql.types.DecimalType
-
- asIntegral() - Method in class org.apache.spark.sql.types.DoubleType
-
- asIntegral() - Method in class org.apache.spark.sql.types.FloatType
-
- asIterator() - Method in class org.apache.spark.serializer.DeserializationStream
-
Read the elements of this stream through an iterator.
- asJavaPairRDD() - Method in class org.apache.spark.api.r.PairwiseRRDD
-
- asJavaRDD() - Method in class org.apache.spark.api.r.RRDD
-
- asJavaRDD() - Method in class org.apache.spark.api.r.StringRRDD
-
- asKeyValueIterator() - Method in class org.apache.spark.serializer.DeserializationStream
-
Read the elements of this stream through an iterator over key-value pairs.
- AskPermissionToCommitOutput - Class in org.apache.spark.scheduler
-
- AskPermissionToCommitOutput(int, int, int) - Constructor for class org.apache.spark.scheduler.AskPermissionToCommitOutput
-
- askTimeout(SparkConf) - Static method in class org.apache.spark.util.RpcUtils
-
- asRDDId() - Method in class org.apache.spark.storage.BlockId
-
- assertValid() - Method in class org.apache.spark.broadcast.Broadcast
-
Check if this broadcast is valid.
- assignments() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- AssociationRules - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- AssociationRules() - Constructor for class org.apache.spark.mllib.fpm.AssociationRules
-
Constructs a default instance with default parameters {minConfidence = 0.8}.
- AssociationRules.Rule<Item> - Class in org.apache.spark.mllib.fpm
-
:: Experimental ::
- AsyncRDDActions<T> - Class in org.apache.spark.rdd
-
A set of asynchronous RDD actions available through an implicit conversion.
- AsyncRDDActions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.AsyncRDDActions
-
- atan(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the tangent inverse of the given value.
- atan(String) - Static method in class org.apache.spark.sql.functions
-
Computes the tangent inverse of the given column.
- atan2(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(Column, String) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(String, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(String, String) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(Column, double) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(String, double) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(double, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- atan2(double, String) - Static method in class org.apache.spark.sql.functions
-
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
- attempt() - Method in class org.apache.spark.scheduler.TaskInfo
-
- attempt() - Method in class org.apache.spark.status.api.v1.TaskData
-
- attemptId() - Method in class org.apache.spark.scheduler.StageInfo
-
- attemptId() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
-
- attemptId() - Method in class org.apache.spark.status.api.v1.StageData
-
- attemptId() - Method in class org.apache.spark.TaskContext
-
- attemptNumber() - Method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
-
- attemptNumber() - Method in class org.apache.spark.scheduler.TaskInfo
-
- attemptNumber() - Method in class org.apache.spark.TaskCommitDenied
-
- attemptNumber() - Method in class org.apache.spark.TaskContext
-
How many times this task has been attempted.
- attempts() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
-
- attr() - Method in class org.apache.spark.graphx.Edge
-
- attr() - Method in class org.apache.spark.graphx.EdgeContext
-
The attribute associated with the edge.
- attr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- Attribute - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
Abstract class for ML attributes.
- Attribute() - Constructor for class org.apache.spark.ml.attribute.Attribute
-
- attribute() - Method in class org.apache.spark.sql.sources.EqualNullSafe
-
- attribute() - Method in class org.apache.spark.sql.sources.EqualTo
-
- attribute() - Method in class org.apache.spark.sql.sources.GreaterThan
-
- attribute() - Method in class org.apache.spark.sql.sources.GreaterThanOrEqual
-
- attribute() - Method in class org.apache.spark.sql.sources.In
-
- attribute() - Method in class org.apache.spark.sql.sources.IsNotNull
-
- attribute() - Method in class org.apache.spark.sql.sources.IsNull
-
- attribute() - Method in class org.apache.spark.sql.sources.LessThan
-
- attribute() - Method in class org.apache.spark.sql.sources.LessThanOrEqual
-
- attribute() - Method in class org.apache.spark.sql.sources.StringContains
-
- attribute() - Method in class org.apache.spark.sql.sources.StringEndsWith
-
- attribute() - Method in class org.apache.spark.sql.sources.StringStartsWith
-
- AttributeGroup - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
Attributes that describe a vector ML column.
- AttributeGroup(String) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group without attribute info.
- AttributeGroup(String, int) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group knowing only the number of attributes.
- AttributeGroup(String, Attribute[]) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group with attributes.
- attributes() - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Optional array of attributes.
- AttributeType - Class in org.apache.spark.ml.attribute
-
:: DeveloperApi ::
An enum-like type for attribute types: AttributeType$.Numeric
, AttributeType$.Nominal
,
and AttributeType$.Binary
.
- AttributeType(String) - Constructor for class org.apache.spark.ml.attribute.AttributeType
-
- attrType() - Method in class org.apache.spark.ml.attribute.Attribute
-
Attribute type.
- attrType() - Method in class org.apache.spark.ml.attribute.BinaryAttribute
-
- attrType() - Method in class org.apache.spark.ml.attribute.NominalAttribute
-
- attrType() - Method in class org.apache.spark.ml.attribute.NumericAttribute
-
- attrType() - Static method in class org.apache.spark.ml.attribute.UnresolvedAttribute
-
- available() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- avg(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the average of the values in a group.
- avg(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the average of the values in a group.
- avg(String...) - Method in class org.apache.spark.sql.GroupedData
-
Compute the mean value for each numeric columns for each group.
- avg(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
-
Compute the mean value for each numeric columns for each group.
- avgMetrics() - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- awaitTermination() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.3.0, replaced by awaitTerminationOrTimeout(Long)
.
- awaitTermination() - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- awaitTermination(long) - Method in class org.apache.spark.streaming.StreamingContext
-
Deprecated.
As of 1.3.0, replaced by awaitTerminationOrTimeout(Long)
.
- awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Wait for the execution to stop.
- awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.StreamingContext
-
Wait for the execution to stop.
- cache() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.api.java.JavaRDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.graphx.Graph
-
Caches the vertices and edges associated with this graph at the previously-specified target
storage levels, which default to MEMORY_ONLY
.
- cache() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
Persists the edge partitions using `targetStorageLevel`, which defaults to MEMORY_ONLY.
- cache() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- cache() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
Persists the vertex partitions at `targetStorageLevel`, which defaults to MEMORY_ONLY.
- cache() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
Caches the underlying RDD.
- cache() - Method in class org.apache.spark.rdd.RDD
-
Persist this RDD with the default storage level (`MEMORY_ONLY`).
- cache() - Method in class org.apache.spark.sql.DataFrame
-
Persist this
DataFrame
with the default storage level (
MEMORY_AND_DISK
).
- cache() - Method in class org.apache.spark.sql.Dataset
-
Persist this
Dataset
with the default storage level (
MEMORY_AND_DISK
).
- cache() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cache() - Method in class org.apache.spark.streaming.dstream.DStream
-
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
- cachedLeafStatuses() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
- cacheManager() - Method in class org.apache.spark.SparkEnv
-
- cacheManager() - Method in class org.apache.spark.sql.SQLContext
-
- cacheTable(String) - Method in class org.apache.spark.sql.SQLContext
-
Caches the specified table in-memory.
- calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.classification.LogisticCostFun
-
- calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.regression.AFTCostFun
-
- calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.regression.LeastSquaresCostFun
-
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
-
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
-
:: DeveloperApi ::
variance calculation
- calculate(double[], double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
-
:: DeveloperApi ::
information calculation for regression
- calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
information calculation for multiclass classification
- calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
-
:: DeveloperApi ::
variance calculation
- CalendarIntervalType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing calendar time intervals.
- CalendarIntervalType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the CalendarIntervalType object.
- call(K, Iterator<V1>, Iterator<V2>) - Method in interface org.apache.spark.api.java.function.CoGroupFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.DoubleFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.FilterFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.FlatMapFunction
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.FlatMapFunction2
-
- call(K, Iterator<V>) - Method in interface org.apache.spark.api.java.function.FlatMapGroupsFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.ForeachFunction
-
- call(Iterator<T>) - Method in interface org.apache.spark.api.java.function.ForeachPartitionFunction
-
- call(T1) - Method in interface org.apache.spark.api.java.function.Function
-
- call() - Method in interface org.apache.spark.api.java.function.Function0
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.Function2
-
- call(T1, T2, T3) - Method in interface org.apache.spark.api.java.function.Function3
-
- call(T1, T2, T3, T4) - Method in interface org.apache.spark.api.java.function.Function4
-
- call(T) - Method in interface org.apache.spark.api.java.function.MapFunction
-
- call(K, Iterator<V>) - Method in interface org.apache.spark.api.java.function.MapGroupsFunction
-
- call(Iterator<T>) - Method in interface org.apache.spark.api.java.function.MapPartitionsFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFlatMapFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.PairFunction
-
- call(T, T) - Method in interface org.apache.spark.api.java.function.ReduceFunction
-
- call(T) - Method in interface org.apache.spark.api.java.function.VoidFunction
-
- call(T1, T2) - Method in interface org.apache.spark.api.java.function.VoidFunction2
-
- call(T1) - Method in interface org.apache.spark.sql.api.java.UDF1
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) - Method in interface org.apache.spark.sql.api.java.UDF10
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11) - Method in interface org.apache.spark.sql.api.java.UDF11
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12) - Method in interface org.apache.spark.sql.api.java.UDF12
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13) - Method in interface org.apache.spark.sql.api.java.UDF13
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14) - Method in interface org.apache.spark.sql.api.java.UDF14
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15) - Method in interface org.apache.spark.sql.api.java.UDF15
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16) - Method in interface org.apache.spark.sql.api.java.UDF16
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17) - Method in interface org.apache.spark.sql.api.java.UDF17
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18) - Method in interface org.apache.spark.sql.api.java.UDF18
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19) - Method in interface org.apache.spark.sql.api.java.UDF19
-
- call(T1, T2) - Method in interface org.apache.spark.sql.api.java.UDF2
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20) - Method in interface org.apache.spark.sql.api.java.UDF20
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21) - Method in interface org.apache.spark.sql.api.java.UDF21
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22) - Method in interface org.apache.spark.sql.api.java.UDF22
-
- call(T1, T2, T3) - Method in interface org.apache.spark.sql.api.java.UDF3
-
- call(T1, T2, T3, T4) - Method in interface org.apache.spark.sql.api.java.UDF4
-
- call(T1, T2, T3, T4, T5) - Method in interface org.apache.spark.sql.api.java.UDF5
-
- call(T1, T2, T3, T4, T5, T6) - Method in interface org.apache.spark.sql.api.java.UDF6
-
- call(T1, T2, T3, T4, T5, T6, T7) - Method in interface org.apache.spark.sql.api.java.UDF7
-
- call(T1, T2, T3, T4, T5, T6, T7, T8) - Method in interface org.apache.spark.sql.api.java.UDF8
-
- call(T1, T2, T3, T4, T5, T6, T7, T8, T9) - Method in interface org.apache.spark.sql.api.java.UDF9
-
- callSite() - Method in class org.apache.spark.storage.RDDInfo
-
- callUDF(String, Column...) - Static method in class org.apache.spark.sql.functions
-
Call an user-defined function.
- callUDF(Function0<?>, DataType) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function1<?, ?>, DataType, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function2<?, ?, ?>, DataType, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function3<?, ?, ?, ?>, DataType, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function4<?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function5<?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function6<?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function7<?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function8<?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf()
This will be removed in Spark 2.0.
- callUDF(Function9<?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf().
This will be removed in Spark 2.0.
- callUDF(Function10<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it's redundant with udf().
This will be removed in Spark 2.0.
- callUDF(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Call an user-defined function.
- callUdf(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.5.0, since it was not coherent to have two functions callUdf and callUDF.
This will be removed in Spark 2.0.
- cancel() - Method in class org.apache.spark.ComplexFutureAction
-
- cancel() - Method in interface org.apache.spark.FutureAction
-
Cancels the execution of this action.
- cancel() - Method in class org.apache.spark.SimpleFutureAction
-
- cancelAllJobs() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelAllJobs() - Method in class org.apache.spark.SparkContext
-
Cancel all jobs that have been scheduled or are running.
- cancelJobGroup(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Cancel active jobs for the specified group.
- cancelJobGroup(String) - Method in class org.apache.spark.SparkContext
-
Cancel active jobs for the specified group.
- canEqual(Object) - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
-
- canEqual(Object) - Method in class org.apache.spark.util.MutablePair
-
- canHandle(String) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.DB2Dialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
-
- canHandle(String) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Check if this dialect instance can handle a certain jdbc url.
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.NoopDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
-
- canHandle(String) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- cartesian(JavaRDDLike<U, ?>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- cartesian(RDD<U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this
and b is in other
.
- caseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
whether to do a case sensitive comparison over the stop words
Default: false
- cast(DataType) - Method in class org.apache.spark.sql.Column
-
Casts the column to a different data type.
- cast(String) - Method in class org.apache.spark.sql.Column
-
Casts the column to a different data type, using the canonical string representation
of the type.
- catalog() - Method in class org.apache.spark.sql.hive.HiveContext
-
- catalog() - Method in class org.apache.spark.sql.SQLContext
-
- CatalystScan - Interface in org.apache.spark.sql.sources
-
::Experimental::
An interface for experimenting with a more direct connection to the query planner.
- Categorical() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- categoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- CategoricalSplit - Class in org.apache.spark.ml.tree
-
:: DeveloperApi ::
Split which tests a categorical feature.
- categories() - Method in class org.apache.spark.mllib.tree.model.Split
-
- categoryMaps() - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- cbrt(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the cube-root of the given value.
- cbrt(String) - Static method in class org.apache.spark.sql.functions
-
Computes the cube-root of the given column.
- ceil(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the ceiling of the given value.
- ceil(String) - Static method in class org.apache.spark.sql.functions
-
Computes the ceiling of the given column.
- ceil() - Method in class org.apache.spark.sql.types.Decimal
-
- changePrecision(int, int) - Method in class org.apache.spark.sql.types.Decimal
-
Update precision and scale while keeping our value the same, and return true if successful.
- checkpoint() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Mark this RDD for checkpointing.
- checkpoint() - Method in class org.apache.spark.graphx.Graph
-
Mark this Graph for checkpointing.
- checkpoint() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- checkpoint() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- checkpoint() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- checkpoint() - Method in class org.apache.spark.rdd.HadoopRDD
-
- checkpoint() - Method in class org.apache.spark.rdd.RDD
-
Mark this RDD for checkpointing.
- checkpoint(Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Enable periodic checkpointing of RDDs of this DStream.
- checkpoint(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Sets the context to periodically checkpoint the DStream operations for master
fault-tolerance.
- checkpoint(Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Enable periodic checkpointing of RDDs of this DStream
- checkpoint(String) - Method in class org.apache.spark.streaming.StreamingContext
-
Set the context to periodically checkpoint the DStream operations for driver
fault-tolerance.
- checkpointData() - Method in class org.apache.spark.rdd.RDD
-
- checkpointData() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDir() - Method in class org.apache.spark.SparkContext
-
- checkpointDir() - Method in class org.apache.spark.streaming.StreamingContext
-
- checkpointDuration() - Method in class org.apache.spark.streaming.dstream.DStream
-
- checkpointDuration() - Method in class org.apache.spark.streaming.StreamingContext
-
- checkpointFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
- checkpointFile(String, ClassTag<T>) - Method in class org.apache.spark.SparkContext
-
- checkpointInterval() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
- checkpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- child() - Method in class org.apache.spark.sql.sources.Not
-
- CHILD_CONNECTION_TIMEOUT - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Maximum time (in ms) to wait for a child process to connect back to the launcher server
when using @link{#start()}.
- CHILD_PROCESS_LOGGER_NAME - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Logger name to use when launching a child process.
- ChiSqSelector - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Chi-Squared feature selection, which selects categorical features to use for predicting a
categorical label.
- ChiSqSelector(String) - Constructor for class org.apache.spark.ml.feature.ChiSqSelector
-
- ChiSqSelector() - Constructor for class org.apache.spark.ml.feature.ChiSqSelector
-
- ChiSqSelector - Class in org.apache.spark.mllib.feature
-
- ChiSqSelector(int) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelector
-
- ChiSqSelectorModel - Class in org.apache.spark.ml.feature
-
- ChiSqSelectorModel - Class in org.apache.spark.mllib.feature
-
Chi Squared selector model.
- ChiSqSelectorModel(int[]) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelectorModel
-
- chiSqTest(Vector, Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's chi-squared goodness of fit test of the observed data against the
expected distribution.
- chiSqTest(Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform
distribution, with each category having an expected frequency of 1 / observed.size
.
- chiSqTest(Matrix) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's independence test on the input contingency matrix, which cannot contain
negative entries or columns or rows that sum up to 0.
- chiSqTest(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Conduct Pearson's independence test for every feature against the label across the input RDD.
- chiSqTest(JavaRDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Java-friendly version of chiSqTest()
- ChiSqTestResult - Class in org.apache.spark.mllib.stat.test
-
Object containing the test results for the chi-squared hypothesis test.
- Classification() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
-
- ClassificationModel<FeaturesType,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
-
:: DeveloperApi ::
- ClassificationModel() - Constructor for class org.apache.spark.ml.classification.ClassificationModel
-
- ClassificationModel - Interface in org.apache.spark.mllib.classification
-
Represents a classification model that predicts to which of a set of categories an example
belongs.
- Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
-
:: DeveloperApi ::
- Classifier() - Constructor for class org.apache.spark.ml.classification.Classifier
-
- className() - Method in class org.apache.spark.ExceptionFailure
-
- classpathEntries() - Method in class org.apache.spark.ui.env.EnvironmentListener
-
- classTag() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- classTag() - Method in class org.apache.spark.api.java.JavaRDD
-
- classTag() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
- classTag() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- classTag() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- clean(long, boolean) - Method in class org.apache.spark.streaming.util.WriteAheadLog
-
Clean all the records that are older than the threshold time.
- CleanAccum - Class in org.apache.spark
-
- CleanAccum(long) - Constructor for class org.apache.spark.CleanAccum
-
- CleanBroadcast - Class in org.apache.spark
-
- CleanBroadcast(long) - Constructor for class org.apache.spark.CleanBroadcast
-
- CleanCheckpoint - Class in org.apache.spark
-
- CleanCheckpoint(int) - Constructor for class org.apache.spark.CleanCheckpoint
-
- CleanRDD - Class in org.apache.spark
-
- CleanRDD(int) - Constructor for class org.apache.spark.CleanRDD
-
- CleanShuffle - Class in org.apache.spark
-
- CleanShuffle(int) - Constructor for class org.apache.spark.CleanShuffle
-
- CleanupTask - Interface in org.apache.spark
-
Classes that represent cleaning tasks.
- CleanupTaskWeakReference - Class in org.apache.spark
-
A WeakReference associated with a CleanupTask.
- CleanupTaskWeakReference(CleanupTask, Object, ReferenceQueue<Object>) - Constructor for class org.apache.spark.CleanupTaskWeakReference
-
- clear(Param<?>) - Method in interface org.apache.spark.ml.param.Params
-
- clear() - Method in class org.apache.spark.sql.util.ExecutionListenerManager
-
- clearActive() - Static method in class org.apache.spark.sql.SQLContext
-
Clears the active SQLContext for current thread.
- clearCache() - Method in class org.apache.spark.sql.SQLContext
-
Removes all cached tables from the in-memory cache.
- clearCallSite() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Pass-through to SparkContext.setCallSite.
- clearCallSite() - Method in class org.apache.spark.SparkContext
-
Clear the thread-local property for overriding the call sites
of actions and RDDs.
- clearDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.RDD
-
Clears the dependencies of this RDD.
- clearDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- clearDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- clearFiles() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearFiles() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of files added by addFile
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJars() - Method in class org.apache.spark.SparkContext
-
Clear the job's list of JARs added by addJar
so that they do not get downloaded to
any new nodes.
- clearJobGroup() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Clear the current thread's job group ID and its description.
- clearJobGroup() - Method in class org.apache.spark.SparkContext
-
Clear the current thread's job group ID and its description.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
Clears the threshold so that predict
will output raw prediction scores.
- clearThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
-
Clears the threshold so that predict
will output raw prediction scores.
- clone() - Method in class org.apache.spark.SparkConf
-
Copy this object
- clone() - Method in class org.apache.spark.sql.types.Decimal
-
- clone() - Method in class org.apache.spark.storage.StorageLevel
-
- clone() - Method in class org.apache.spark.util.random.BernoulliCellSampler
-
- clone() - Method in class org.apache.spark.util.random.BernoulliSampler
-
- clone() - Method in class org.apache.spark.util.random.PoissonSampler
-
- clone() - Method in interface org.apache.spark.util.random.RandomSampler
-
return a copy of the RandomSampler object
- cloneComplement() - Method in class org.apache.spark.util.random.BernoulliCellSampler
-
Return a sampler that is the complement of the range specified of the current sampler.
- close() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- close() - Method in class org.apache.spark.input.PortableDataStream
-
Closing the PortableDataStream is not needed anymore.
- close() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
-
- close() - Method in class org.apache.spark.serializer.DeserializationStream
-
- close() - Method in class org.apache.spark.serializer.SerializationStream
-
- close() - Method in class org.apache.spark.sql.sources.OutputWriter
-
- close() - Method in class org.apache.spark.storage.BufferReleasingInputStream
-
- close() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
-
- close() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
- close() - Method in class org.apache.spark.streaming.util.WriteAheadLog
-
Close this log and release any resources.
- closeLogWriter(int) - Method in class org.apache.spark.scheduler.JobLogger
-
Close log file, and clean the stage relationship in stageIdToJobId
- closureSerializer() - Method in class org.apache.spark.SparkEnv
-
- cls() - Method in class org.apache.spark.util.MethodIdentifier
-
- clsTag() - Method in interface org.apache.spark.sql.Encoder
-
A ClassTag that can be used to construct and Array to contain a collection of `T`.
- cluster() - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
-
- clusterCenters() - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
-
Leaf cluster centers.
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- clusterCenters() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
-
- clusterWeights() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
-
- cn() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- coalesce(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int, boolean, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD that is reduced into numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
that has exactly
numPartitions
partitions.
- coalesce(int) - Method in class org.apache.spark.sql.Dataset
-
Returns a new
Dataset
that has exactly
numPartitions
partitions.
- coalesce(Column...) - Static method in class org.apache.spark.sql.functions
-
Returns the first column that is not null, or null if all inputs are null.
- coalesce(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Returns the first column that is not null, or null if all inputs are null.
- code() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- codeLen() - Method in class org.apache.spark.mllib.feature.VocabWord
-
- coefficients() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- coefficients() - Method in class org.apache.spark.ml.regression.AFTSurvivalRegressionModel
-
- coefficients() - Method in class org.apache.spark.ml.regression.LinearRegressionModel
-
- coefficientStandardErrors() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
Standard error of estimated coefficients and intercept.
- cogroup(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- cogroup(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
- cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
For each key k in this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
- cogroup(GroupedDataset<K, U>, Function3<K, Iterator<V>, Iterator<U>, TraversableOnce<R>>, Encoder<R>) - Method in class org.apache.spark.sql.GroupedDataset
-
Applies the given function to each cogrouped data.
- cogroup(GroupedDataset<K, U>, CoGroupFunction<K, V, U, R>, Encoder<R>) - Method in class org.apache.spark.sql.GroupedDataset
-
Applies the given function to each cogrouped data.
- cogroup(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- cogroup(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'cogroup' between RDDs of this
DStream and other
DStream.
- CoGroupedRDD<K> - Class in org.apache.spark.rdd
-
:: DeveloperApi ::
A RDD that cogroups its parents.
- CoGroupedRDD(Seq<RDD<? extends Product2<K, ?>>>, Partitioner, ClassTag<K>) - Constructor for class org.apache.spark.rdd.CoGroupedRDD
-
- CoGroupFunction<K,V1,V2,R> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more output records from each grouping key and its values from 2
Datasets.
- col(String) - Method in class org.apache.spark.sql.DataFrame
-
Selects column based on the column name and return it as a
Column
.
- col(String) - Static method in class org.apache.spark.sql.functions
-
Returns a
Column
based on the given column name.
- collect() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in this RDD.
- collect() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- collect() - Method in class org.apache.spark.rdd.RDD
-
Return an array that contains all of the elements in this RDD.
- collect(PartialFunction<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return an RDD that contains all matching values by applying f
.
- collect() - Method in class org.apache.spark.sql.DataFrame
-
Returns an array that contains all of
Row
s in this
DataFrame
.
- collect() - Method in class org.apache.spark.sql.Dataset
-
Returns an array that contains all the elements in this
Dataset
.
- collect_list(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns a list of objects with duplicates.
- collect_list(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns a list of objects with duplicates.
- collect_set(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns a set of objects with duplicate elements eliminated.
- collect_set(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns a set of objects with duplicate elements eliminated.
- collectAsList() - Method in class org.apache.spark.sql.DataFrame
-
Returns a Java list that contains all of
Row
s in this
DataFrame
.
- collectAsList() - Method in class org.apache.spark.sql.Dataset
-
Returns an array that contains all the elements in this
Dataset
.
- collectAsMap() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsMap() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return the key-value pairs in this RDD to the master as a Map.
- collectAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of collect
, which returns a future for
retrieving an array containing all of the elements in this RDD.
- collectAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for retrieving all elements of this RDD.
- collectEdges(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
-
Returns an RDD that contains for each vertex v its local edges,
i.e., the edges that are incident on v, in the user-specified direction.
- collectNeighborIds(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
-
Collect the neighbor vertex ids for each vertex.
- collectNeighbors(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
-
Collect the neighbor vertex attributes for each vertex.
- collectPartitions(int[]) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an array that contains all of the elements in a specific partition of this RDD.
- collectToPython() - Method in class org.apache.spark.sql.DataFrame
-
- colPtrs() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- colsPerBlock() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
-
- colStats(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Computes column-wise summary statistics for the input RDD[Vector].
- Column - Class in org.apache.spark.sql
-
:: Experimental ::
A column that will be computed based on the data in a
DataFrame
.
- Column(Expression) - Constructor for class org.apache.spark.sql.Column
-
- Column(String) - Constructor for class org.apache.spark.sql.Column
-
- column(String) - Static method in class org.apache.spark.sql.functions
-
Returns a
Column
based on the given column name.
- ColumnName - Class in org.apache.spark.sql
-
:: Experimental ::
A convenient class used for constructing schema.
- ColumnName(String) - Constructor for class org.apache.spark.sql.ColumnName
-
- ColumnPruner - Class in org.apache.spark.ml.feature
-
Utility transformer for removing temporary columns from a DataFrame.
- ColumnPruner(Set<String>) - Constructor for class org.apache.spark.ml.feature.ColumnPruner
-
- columns() - Method in class org.apache.spark.sql.DataFrame
-
Returns all column names as an array.
- columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Compute all cosine similarities between columns of this matrix using the brute-force
approach of computing normalized dot products.
- columnSimilarities(double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Compute similarities between columns of this matrix using a sampling approach.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the output RDD and uses map-side
aggregation.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Simplified version of combineByKey that hash-partitions the resulting RDD using the existing
partitioner/parallelism level and using map-side aggregation.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Combine elements of each key in DStream's RDDs using custom function.
- combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, ClassTag<C>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Combine elements of each key in DStream's RDDs using custom functions.
- combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
Generic function to combine the elements for each key using a custom set of aggregation
functions.
- combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD.
- combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
:: Experimental ::
Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the
existing partitioner/parallelism level.
- combineCombinersByKey(Iterator<Product2<K, C>>) - Method in class org.apache.spark.Aggregator
-
- combineCombinersByKey(Iterator<Product2<K, C>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- combinerClassName() - Method in class org.apache.spark.ShuffleDependency
-
- combineValuesByKey(Iterator<Product2<K, V>>) - Method in class org.apache.spark.Aggregator
-
- combineValuesByKey(Iterator<Product2<K, V>>, TaskContext) - Method in class org.apache.spark.Aggregator
-
- compare(PartitionGroup, PartitionGroup) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- compare(Option<PartitionGroup>, Option<PartitionGroup>) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- compare(Decimal) - Method in class org.apache.spark.sql.types.Decimal
-
- compare(RDDInfo) - Method in class org.apache.spark.storage.RDDInfo
-
- compareTo(SparkShutdownHook) - Method in class org.apache.spark.util.SparkShutdownHook
-
- completed() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
-
- completedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- completedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- completedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- completionTime() - Method in class org.apache.spark.scheduler.StageInfo
-
Time when all tasks in the stage completed or when the stage was cancelled.
- completionTime() - Method in class org.apache.spark.status.api.v1.JobData
-
- ComplexFutureAction<T> - Class in org.apache.spark
-
A
FutureAction
for actions that could trigger multiple Spark jobs.
- ComplexFutureAction() - Constructor for class org.apache.spark.ComplexFutureAction
-
- compressed() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Returns a vector in either dense or sparse format, whichever uses less storage.
- compressedInputStream(InputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedInputStream(InputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in interface org.apache.spark.io.CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
-
- compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
-
- CompressionCodec - Interface in org.apache.spark.io
-
:: DeveloperApi ::
CompressionCodec allows the customization of choosing different compression implementations
to be used in block storage.
- compute(Partition, TaskContext) - Method in class org.apache.spark.api.r.BaseRRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.EdgeRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.VertexRDD
-
Provides the RDD[(VertexId, VD)]
equivalent output.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point.
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
-
Compute the gradient and loss given the features of a single data point,
add the gradient to a provided vector to avoid creating new objects, and return loss.
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.L1Updater
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
-
- compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SimpleUpdater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SquaredL2Updater
-
- compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.Updater
-
Compute an updated value for weights given the gradient, stepSize, iteration number and
regularization parameter.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.HadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.JdbcRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.PartitionPruningRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.RDD
-
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.ShuffledRDD
-
- compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.UnionRDD
-
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Generate an RDD for the given duration
- compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Method that generates a RDD for the given Duration
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- compute(Time) - Method in class org.apache.spark.streaming.dstream.DStream
-
Method that generates a RDD for the given time
- compute(Time) - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
- computeColumnSummaryStatistics() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes column-wise summary statistics.
- computeCost(DataFrame) - Method in class org.apache.spark.ml.clustering.KMeansModel
-
Return the K-means cost (sum of squared distances of points to their nearest center) for this
model on the given data.
- computeCost(Vector) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
-
Computes the squared distance between the input point and the cluster center it belongs to.
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
-
Computes the sum of squared distances between the input points and their corresponding cluster
centers.
- computeCost(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
-
Java-friendly version of computeCost()
.
- computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
Return the K-means cost (sum of squared distances of points to their nearest center) for this
model on the given data.
- computeCovariance() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the covariance matrix, treating each row as an observation.
- computeError(org.apache.spark.mllib.tree.model.TreeEnsembleModel, RDD<LabeledPoint>) - Method in interface org.apache.spark.mllib.tree.loss.Loss
-
Method to calculate error of the base learner for the gradient boosting calculation.
- computeError(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
-
Method to calculate loss when the predictions are already known.
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the Gramian matrix A^T A
.
- computeInitialPredictionAndError(RDD<LabeledPoint>, double, DecisionTreeModel, Loss) - Static method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
:: DeveloperApi ::
Compute the initial predictions and errors for a dataset for the first
iteration of gradient boosting.
- computePreferredLocations(Seq<InputFormatInfo>) - Static method in class org.apache.spark.scheduler.InputFormatInfo
-
Computes the preferred locations based on input(s) and returned a location to block map.
- computePrincipalComponents(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes the top k principal components.
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
-
- computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
-
Computes singular value decomposition of this matrix.
- concat(Column...) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column.
- concat(Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column.
- concat_ws(String, Column...) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column,
using the given separator.
- concat_ws(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Concatenates multiple input string columns together into a single string column,
using the given separator.
- conf() - Method in class org.apache.spark.SparkEnv
-
- conf() - Method in class org.apache.spark.sql.hive.HiveContext
-
- conf() - Method in class org.apache.spark.sql.SQLContext
-
- conf() - Method in class org.apache.spark.streaming.StreamingContext
-
- confidence() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
-
Returns the confidence of the rule.
- confidence() - Method in class org.apache.spark.partial.BoundedDouble
-
- configuration() - Method in class org.apache.spark.scheduler.InputFormatInfo
-
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.HadoopRDD
-
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
- CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.NewHadoopRDD
-
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
- configure() - Method in class org.apache.spark.sql.hive.HiveContext
-
Overridden by child classes that need to set configuration before the client init.
- confusionMatrix() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns confusion matrix:
predicted classes are in columns,
they are ordered by class label ascending,
as in "labels"
- connectedComponents() - Method in class org.apache.spark.graphx.GraphOps
-
Compute the connected component membership of each vertex and return a graph with the vertex
value containing the lowest vertex id in the connected component containing that vertex.
- ConnectedComponents - Class in org.apache.spark.graphx.lib
-
Connected components algorithm.
- ConnectedComponents() - Constructor for class org.apache.spark.graphx.lib.ConnectedComponents
-
- consequent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
-
- ConstantInputDStream<T> - Class in org.apache.spark.streaming.dstream
-
An input stream that always returns the same RDD on each timestep.
- ConstantInputDStream(StreamingContext, RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ConstantInputDStream
-
- contains(Param<?>) - Method in class org.apache.spark.ml.param.ParamMap
-
Checks whether a parameter is explicitly specified.
- contains(String) - Method in class org.apache.spark.SparkConf
-
Does the configuration contain a given parameter?
- contains(Object) - Method in class org.apache.spark.sql.Column
-
Contains the other element.
- contains(String) - Method in class org.apache.spark.sql.types.Metadata
-
Tests whether this Metadata contains a binding for a key.
- containsBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
-
Return whether the given block is stored in this block manager in O(1) time.
- containsCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
- containsNull() - Method in class org.apache.spark.sql.types.ArrayType
-
- context() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
- context() - Method in class org.apache.spark.InterruptibleIterator
-
- context(SQLContext) - Method in class org.apache.spark.ml.util.MLReader
-
- context(SQLContext) - Method in class org.apache.spark.ml.util.MLWriter
-
- context() - Method in class org.apache.spark.rdd.RDD
-
- context() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- context() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return the StreamingContext associated with this DStream
- Continuous() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
-
- ContinuousSplit - Class in org.apache.spark.ml.tree
-
:: DeveloperApi ::
Split which tests a continuous feature.
- conv(Column, int, int) - Static method in class org.apache.spark.sql.functions
-
Convert a number in a string column from one base to another.
- CONVERT_CTAS() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- CONVERT_METASTORE_PARQUET() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING() - Static method in class org.apache.spark.sql.hive.HiveContext
-
- convertCTAS() - Method in class org.apache.spark.sql.hive.HiveContext
-
When true, a table created by a Hive CTAS statement (no USING clause) will be
converted to a data source table, using the data source set by spark.sql.sources.default.
- convertMetastoreParquet() - Method in class org.apache.spark.sql.hive.HiveContext
-
When true, enables an experimental feature where metastore tables that use the parquet SerDe
are automatically converted to use the Spark SQL parquet table scan, instead of the Hive
SerDe.
- convertMetastoreParquetWithSchemaMerging() - Method in class org.apache.spark.sql.hive.HiveContext
-
When true, also tries to merge possibly different but compatible Parquet schemas in different
Parquet data files.
- convertToCanonicalEdges(Function2<ED, ED, ED>) - Method in class org.apache.spark.graphx.GraphOps
-
Convert bi-directional edges into uni-directional ones.
- CoordinateMatrix - Class in org.apache.spark.mllib.linalg.distributed
-
- CoordinateMatrix(RDD<MatrixEntry>, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
- CoordinateMatrix(RDD<MatrixEntry>) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayes
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayesModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRestModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.DistributedLDAModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeans
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeansModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.LDA
-
- copy(ParamMap) - Method in class org.apache.spark.ml.clustering.LocalLDAModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Estimator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.Evaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Binarizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Bucketizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ChiSqSelector
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ChiSqSelectorModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.ColumnPruner
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.HashingTF
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDF
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDFModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.IndexToString
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Interaction
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCA
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCAModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.QuantileDiscretizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormula
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormulaModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.SQLTransformer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScalerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Tokenizer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAssembler
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAttributeRewriter
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Model
-
- copy() - Method in class org.apache.spark.ml.param.ParamMap
-
Creates a copy of this param map.
- copy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
-
Creates a copy of this instance with the same UID and some extra params.
- copy(ParamMap) - Method in class org.apache.spark.ml.Pipeline
-
- copy(ParamMap) - Method in class org.apache.spark.ml.PipelineModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.PipelineStage
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Predictor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALS
-
- copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALSModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegression
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
-
- copy(ParamMap) - Method in class org.apache.spark.ml.Transformer
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
-
- copy(ParamMap) - Method in class org.apache.spark.ml.UnaryTransformer
-
- copy() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
-
- copy() - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- copy() - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Get a deep copy of the matrix.
- copy() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
-
- copy() - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- copy() - Method in interface org.apache.spark.mllib.linalg.Vector
-
Makes a deep copy of this vector.
- copy() - Method in class org.apache.spark.mllib.random.ExponentialGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.GammaGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.LogNormalGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.PoissonGenerator
-
- copy() - Method in interface org.apache.spark.mllib.random.RandomDataGenerator
-
Returns a copy of the RandomDataGenerator with a new instance of the rng object used in the
class when applicable for non-locking concurrent usage.
- copy() - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.UniformGenerator
-
- copy() - Method in class org.apache.spark.mllib.random.WeibullGenerator
-
- copy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
Returns a shallow copy of this instance.
- copy() - Method in interface org.apache.spark.sql.Row
-
Make a copy of the current
Row
object.
- copy() - Method in class org.apache.spark.util.StatCounter
-
Clone this StatCounter
- copyValues(T, ParamMap) - Method in interface org.apache.spark.ml.param.Params
-
Copies param values from this instance to another instance for params shared by them.
- coresGranted() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
-
- coresPerExecutor() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
-
- corr(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the Pearson correlation matrix for the input RDD of Vectors.
- corr(RDD<Vector>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the correlation matrix for the input RDD of Vectors using the specified method.
- corr(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the Pearson correlation for the input RDDs.
- corr(JavaRDD<Double>, JavaRDD<Double>) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Java-friendly version of corr()
- corr(RDD<Object>, RDD<Object>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Compute the correlation for the input RDDs using the specified method.
- corr(JavaRDD<Double>, JavaRDD<Double>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
-
Java-friendly version of corr()
- corr(String, String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Calculates the correlation of two columns of a DataFrame.
- corr(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
- corr(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
- corr(String, String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
- cos(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine of the given value.
- cos(String) - Static method in class org.apache.spark.sql.functions
-
Computes the cosine of the given column.
- cosh(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic cosine of the given value.
- cosh(String) - Static method in class org.apache.spark.sql.functions
-
Computes the hyperbolic cosine of the given column.
- count() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
The number of edges in the RDD.
- count() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
The number of vertices in the RDD.
- count() - Method in class org.apache.spark.ml.regression.AFTAggregator
-
- count() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
- count() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
-
Sample size.
- count() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
-
Sample size.
- count() - Method in class org.apache.spark.rdd.RDD
-
Return the number of elements in the RDD.
- count() - Method in class org.apache.spark.sql.DataFrame
-
- count() - Method in class org.apache.spark.sql.Dataset
-
Returns the number of elements in the
Dataset
.
- count(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of items in a group.
- count(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of items in a group.
- count() - Method in class org.apache.spark.sql.GroupedData
-
Count the number of rows for each group.
- count() - Method in class org.apache.spark.sql.GroupedDataset
-
Returns a
Dataset
that contains a tuple with each key and the number of items present
for that key.
- count() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting each RDD
of this DStream.
- count() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
Number of messages this OffsetRange refers to
- count() - Method in class org.apache.spark.util.StatCounter
-
- countApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApprox(long, double) - Method in class org.apache.spark.rdd.RDD
-
Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.
- countApproxDistinct(double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(int, int) - Method in class org.apache.spark.rdd.RDD
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinct(double) - Method in class org.apache.spark.rdd.RDD
-
Return approximate number of distinct elements in the RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(int, int, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countApproxDistinctByKey(double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Return approximate number of distinct values for each key in this RDD.
- countAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of count
, which returns a
future for counting the number of elements in this RDD.
- countAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Returns a future for counting the number of elements in the RDD.
- countByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Count the number of elements for each key, and return the result to the master as a Map.
- countByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Count the number of elements for each key, collecting the results to a local Map.
- countByKeyApprox(long) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByKeyApprox(long, double) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Approximate version of countByKey that can return a partial result if it does
not finish within a timeout.
- countByValue() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the count of each unique value in this RDD as a map of (value, count) pairs.
- countByValue(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return the count of each unique value in this RDD as a local map of (value, count) pairs.
- countByValue() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValue(int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the counts of each distinct value in
each RDD of this DStream.
- countByValueAndWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueAndWindow(Duration, Duration, int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD contains the count of distinct elements in
RDDs in a sliding window over this DStream.
- countByValueApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
(Experimental) Approximate version of countByValue().
- countByValueApprox(long, double, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Approximate version of countByValue().
- countByWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a window over this DStream.
- countByWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD has a single element generated by counting the number
of elements in a sliding window over this DStream.
- countDistinct(Column, Column...) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countDistinct(String, String...) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countDistinct(Column, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countDistinct(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the number of distinct items in a group.
- countTowardsTaskFailures() - Method in class org.apache.spark.ExecutorLostFailure
-
- countTowardsTaskFailures() - Method in class org.apache.spark.TaskCommitDenied
-
If a task failed because its attempt to commit was denied, do not count this failure
towards failing the stage.
- countTowardsTaskFailures() - Method in interface org.apache.spark.TaskFailedReason
-
Whether this task failure should be counted towards the maximum number of times the task is
allowed to fail before the stage is aborted.
- CountVectorizer - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Extracts a vocabulary from document collections and generates a
CountVectorizerModel
.
- CountVectorizer(String) - Constructor for class org.apache.spark.ml.feature.CountVectorizer
-
- CountVectorizer() - Constructor for class org.apache.spark.ml.feature.CountVectorizer
-
- CountVectorizerModel - Class in org.apache.spark.ml.feature
-
:: Experimental ::
Converts a text document to a sparse vector of token counts.
- CountVectorizerModel(String, String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
-
- CountVectorizerModel(String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
-
- cov(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Calculate the sample covariance of two numerical columns of a DataFrame.
- crc32(Column) - Static method in class org.apache.spark.sql.functions
-
Calculates the cyclic redundancy check value (CRC32) of a binary column and
returns the value as a bigint.
- CreatableRelationProvider - Interface in org.apache.spark.sql.sources
-
- create(boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Deprecated.
- create(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
-
Create a new StorageLevel object.
- create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int, Function<ResultSet, T>) - Static method in class org.apache.spark.rdd.JdbcRDD
-
Create an RDD that executes an SQL query on a JDBC connection and reads results.
- create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int) - Static method in class org.apache.spark.rdd.JdbcRDD
-
Create an RDD that executes an SQL query on a JDBC connection and reads results.
- create(RDD<T>, Function1<Object, Object>) - Static method in class org.apache.spark.rdd.PartitionPruningRDD
-
Create a PartitionPruningRDD.
- create(Object...) - Static method in class org.apache.spark.sql.RowFactory
-
Create a
Row
from the given arguments.
- create() - Method in interface org.apache.spark.streaming.api.java.JavaStreamingContextFactory
-
- create(String, int) - Static method in class org.apache.spark.streaming.kafka.Broker
-
- create(String, int, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- create(TopicAndPartition, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
-
- createArrayType(DataType) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates an ArrayType by specifying the data type of elements (elementType
).
- createArrayType(DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates an ArrayType by specifying the data type of elements (elementType
) and
whether the array contains null values (containsNull
).
- createCombiner() - Method in class org.apache.spark.Aggregator
-
- createDataFrame(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(Seq<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(List<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataFrame(List<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataset(Seq<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataset(RDD<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
-
- createDataset(List<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
-
- createDecimalType(int, int) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a DecimalType by specifying the precision and scale.
- createDecimalType() - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a DecimalType with default precision and scale, which are 10 and 0.
- createDirectStream(StreamingContext, Map<String, String>, Map<TopicAndPartition, Object>, Function1<MessageAndMetadata<K, V>, R>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>, ClassTag<R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createDirectStream(StreamingContext, Map<String, String>, Set<String>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createDirectStream(JavaStreamingContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Class<R>, Map<String, String>, Map<TopicAndPartition, Long>, Function<MessageAndMetadata<K, V>, R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createDirectStream(JavaStreamingContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Map<String, String>, Set<String>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that directly pulls messages from Kafka Brokers
without using any receiver.
- createExternalTable(String, String) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, String) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
-
- createJDBCTable(String, String, boolean) - Method in class org.apache.spark.sql.DataFrame
-
Deprecated.
As of 1.340, replaced by write().jdbc()
. This will be removed in Spark 2.0.
- createLogDir() - Method in class org.apache.spark.scheduler.JobLogger
-
Create a folder for log files, the folder's name is the creation time of jobLogger
- createLogWriter(int) - Method in class org.apache.spark.scheduler.JobLogger
-
Create a log file for one job
- createMapType(DataType, DataType) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a MapType by specifying the data type of keys (keyType
) and values
(keyType
).
- createMapType(DataType, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a MapType by specifying the data type of keys (keyType
), the data type of
values (keyType
), and whether values contain any null value
(valueContainsNull
).
- createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.SVMWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
Create a model given the weights and intercept
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.LassoWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
-
- createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
-
- createPollingStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
- createRDD(SparkContext, Map<String, String>, OffsetRange[], ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDD(SparkContext, Map<String, String>, OffsetRange[], Map<TopicAndPartition, Broker>, Function1<MessageAndMetadata<K, V>, R>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>, ClassTag<R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDD(JavaSparkContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Map<String, String>, OffsetRange[]) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDD(JavaSparkContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Class<R>, Map<String, String>, OffsetRange[], Map<TopicAndPartition, Broker>, Function<MessageAndMetadata<K, V>, R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create a RDD from Kafka using offset ranges for each topic and partition.
- createRDDFromArray(JavaSparkContext, byte[][]) - Static method in class org.apache.spark.api.r.RRDD
-
Create an RRDD given a sequence of byte arrays.
- createRDDWithLocalProperties(Time, boolean, Function0<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Wrap a body of code such that the call site and operation scope
information are passed to the RDDs created in this body properly.
- createRelation(SQLContext, Map<String, String>) - Method in class org.apache.spark.ml.source.libsvm.DefaultSource
-
- createRelation(SQLContext, SaveMode, Map<String, String>, DataFrame) - Method in interface org.apache.spark.sql.sources.CreatableRelationProvider
-
Creates a relation with the given parameters based on the contents of the given
DataFrame.
- createRelation(SQLContext, String[], Option<StructType>, Option<StructType>, Map<String, String>) - Method in interface org.apache.spark.sql.sources.HadoopFsRelationProvider
-
Returns a new base relation with the given parameters, a user defined schema, and a list of
partition columns.
- createRelation(SQLContext, Map<String, String>) - Method in interface org.apache.spark.sql.sources.RelationProvider
-
Returns a new base relation with the given parameters.
- createRelation(SQLContext, Map<String, String>, StructType) - Method in interface org.apache.spark.sql.sources.SchemaRelationProvider
-
Returns a new base relation with the given parameters and user defined schema.
- createRWorker(int) - Static method in class org.apache.spark.api.r.RRDD
-
ProcessBuilder used to launch worker R processes.
- createSparkContext(String, String, String, String[], Map<Object, Object>, Map<Object, Object>) - Static method in class org.apache.spark.api.r.RRDD
-
- createStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(StreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Create a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(JavaStreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
-
Creates a input stream from a Flume source.
- createStream(StreamingContext, String, String, Map<String, Object>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(StreamingContext, Map<String, String>, Map<String, Object>, StorageLevel, ClassTag<K>, ClassTag<V>, ClassTag<U>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(JavaStreamingContext, String, String, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(JavaStreamingContext, Class<K>, Class<V>, Class<U>, Class<T>, Map<String, String>, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
-
Create an input stream that pulls messages from Kafka Brokers.
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, String, String, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(StreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
-
Create an input stream that pulls messages from a Kinesis stream.
- createStream(JavaStreamingContext, String, String, String, String, int, Duration, StorageLevel, String, String) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
-
- createStream(StreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(JavaStreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
-
Create an input stream that receives messages pushed by a MQTT publisher.
- createStream(StreamingContext, Option<Authorization>, Seq<String>, StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter using Twitter4J's default
OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey,
twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
twitter4j.oauth.accessTokenSecret.
- createStream(JavaStreamingContext, Authorization) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(JavaStreamingContext, Authorization, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
-
Create a input stream that returns tweets received from Twitter.
- createStream(StreamingContext, String, Subscribe, Function1<Seq<ByteString>, Iterator<T>>, StorageLevel, SupervisorStrategy, ClassTag<T>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel, SupervisorStrategy) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
-
Create an input stream that receives messages pushed by a zeromq publisher.
- createStructField(String, DataType, boolean, Metadata) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructField by specifying the name (name
), data type (dataType
) and
whether values of this field can be null values (nullable
).
- createStructField(String, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructField with empty metadata.
- createStructType(List<StructField>) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructType with the given list of StructFields (fields
).
- createStructType(StructField[]) - Static method in class org.apache.spark.sql.types.DataTypes
-
Creates a StructType with the given StructField array (fields
).
- createTransformFunc() - Method in class org.apache.spark.ml.feature.DCT
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.NGram
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.Normalizer
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- createTransformFunc() - Method in class org.apache.spark.ml.feature.Tokenizer
-
- createTransformFunc() - Method in class org.apache.spark.ml.UnaryTransformer
-
Creates the transform function using the given param map.
- creationSite() - Method in class org.apache.spark.rdd.RDD
-
User code that created this RDD (e.g.
- creationSite() - Method in class org.apache.spark.streaming.dstream.DStream
-
- crosstab(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Computes a pair-wise frequency table of the given columns.
- CrossValidator - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
K-fold cross validation.
- CrossValidator(String) - Constructor for class org.apache.spark.ml.tuning.CrossValidator
-
- CrossValidator() - Constructor for class org.apache.spark.ml.tuning.CrossValidator
-
- CrossValidatorModel - Class in org.apache.spark.ml.tuning
-
:: Experimental ::
Model from k-fold cross validation.
- cube(Column...) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cube(String, String...) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cube(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cube(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Create a multi-dimensional cube for the current
DataFrame
using the specified columns,
so we can run aggregation on them.
- cume_dist() - Static method in class org.apache.spark.sql.functions
-
Window function: returns the cumulative distribution of values within a window partition,
i.e.
- cumeDist() - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.6.0, replaced by cume_dist
. This will be removed in Spark 2.0.
- current_date() - Static method in class org.apache.spark.sql.functions
-
Returns the current date as a date column.
- current_timestamp() - Static method in class org.apache.spark.sql.functions
-
Returns the current timestamp as a timestamp column.
- currentAttemptId() - Method in interface org.apache.spark.SparkStageInfo
-
- currentAttemptId() - Method in class org.apache.spark.SparkStageInfoImpl
-
- currPrefLocs(Partition) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- databaseTypeDefinition() - Method in class org.apache.spark.sql.jdbc.JdbcType
-
- dataDistribution() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- DataFrame - Class in org.apache.spark.sql
-
:: Experimental ::
A distributed collection of data organized into named columns.
- DataFrame(SQLContext, LogicalPlan) - Constructor for class org.apache.spark.sql.DataFrame
-
A constructor that automatically analyzes the logical plan.
- DataFrameHolder - Class in org.apache.spark.sql
-
A container for a
DataFrame
, used for implicit conversions.
- DataFrameNaFunctions - Class in org.apache.spark.sql
-
:: Experimental ::
Functionality for working with missing data in
DataFrame
s.
- DataFrameReader - Class in org.apache.spark.sql
-
:: Experimental ::
Interface used to load a
DataFrame
from external storage systems (e.g.
- DataFrameStatFunctions - Class in org.apache.spark.sql
-
:: Experimental ::
Statistic functions for
DataFrame
s.
- DataFrameWriter - Class in org.apache.spark.sql
-
:: Experimental ::
Interface used to write a
DataFrame
to external storage systems (e.g.
- dataSchema() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
-
Specifies schema of actual data files.
- Dataset<T> - Class in org.apache.spark.sql
-
:: Experimental ::
A
Dataset
is a strongly typed collection of objects that can be transformed in parallel
using functional or relational operations.
- DatasetHolder<T> - Class in org.apache.spark.sql
-
A container for a
Dataset
, used for implicit conversions.
- DataSourceRegister - Interface in org.apache.spark.sql.sources
-
::DeveloperApi::
Data sources should implement this trait so that they can register an alias to their data source.
- dataStream() - Method in class org.apache.spark.api.r.BaseRRDD
-
- dataType() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
- DataType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The base type of all Spark SQL data types.
- DataType() - Constructor for class org.apache.spark.sql.types.DataType
-
- dataType() - Method in class org.apache.spark.sql.types.StructField
-
- dataType() - Method in class org.apache.spark.sql.UserDefinedFunction
-
- DataTypes - Class in org.apache.spark.sql.types
-
To get/create specific data type, users should use singleton objects and factory methods
provided by this class.
- DataTypes() - Constructor for class org.apache.spark.sql.types.DataTypes
-
- DataValidators - Class in org.apache.spark.mllib.util
-
:: DeveloperApi ::
A collection of methods used to validate data before applying ML algorithms.
- DataValidators() - Constructor for class org.apache.spark.mllib.util.DataValidators
-
- date() - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type date.
- DATE() - Static method in class org.apache.spark.sql.Encoders
-
An encoder for nullable date type.
- date_add(Column, int) - Static method in class org.apache.spark.sql.functions
-
Returns the date that is days
days after start
- date_format(Column, String) - Static method in class org.apache.spark.sql.functions
-
Converts a date/timestamp/string to a value of string in the format specified by the date
format given by the second argument.
- date_sub(Column, int) - Static method in class org.apache.spark.sql.functions
-
Returns the date that is days
days before start
- datediff(Column, Column) - Static method in class org.apache.spark.sql.functions
-
Returns the number of days from start
to end
.
- DateType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the DateType object.
- DateType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
A date type, supporting "0001-01-01" through "9999-12-31".
- dayofmonth(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the day of the month as an integer from a given date/timestamp/string.
- dayofyear(Column) - Static method in class org.apache.spark.sql.functions
-
Extracts the day of the year as an integer from a given date/timestamp/string.
- DB2Dialect - Class in org.apache.spark.sql.jdbc
-
- DB2Dialect() - Constructor for class org.apache.spark.sql.jdbc.DB2Dialect
-
- DCT - Class in org.apache.spark.ml.feature
-
:: Experimental ::
A feature transformer that takes the 1D discrete cosine transform of a real vector.
- DCT(String) - Constructor for class org.apache.spark.ml.feature.DCT
-
- DCT() - Constructor for class org.apache.spark.ml.feature.DCT
-
- ddlParser() - Method in class org.apache.spark.sql.SQLContext
-
- decayFactor() - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
-
- decimal() - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type decimal.
- decimal(int, int) - Method in class org.apache.spark.sql.ColumnName
-
Creates a new StructField
of type decimal.
- DECIMAL() - Static method in class org.apache.spark.sql.Encoders
-
An encoder for nullable decimal type.
- Decimal - Class in org.apache.spark.sql.types
-
A mutable implementation of BigDecimal that can hold a Long if values are small enough.
- Decimal() - Constructor for class org.apache.spark.sql.types.Decimal
-
- DecimalType - Class in org.apache.spark.sql.types
-
- DecimalType(int, int) - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecimalType(int) - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecimalType() - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecimalType(Option<PrecisionInfo>) - Constructor for class org.apache.spark.sql.types.DecimalType
-
- DecisionTree - Class in org.apache.spark.mllib.tree
-
A class which implements a decision tree learning algorithm for classification and regression.
- DecisionTree(Strategy) - Constructor for class org.apache.spark.mllib.tree.DecisionTree
-
- DecisionTreeClassificationModel - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Decision tree
model for classification.
- DecisionTreeClassifier - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Decision tree
learning algorithm
for classification.
- DecisionTreeClassifier(String) - Constructor for class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- DecisionTreeClassifier() - Constructor for class org.apache.spark.ml.classification.DecisionTreeClassifier
-
- DecisionTreeModel - Class in org.apache.spark.mllib.tree.model
-
Decision tree model for classification or regression.
- DecisionTreeModel(Node, Enumeration.Value) - Constructor for class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- DecisionTreeRegressionModel - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Decision tree
model for regression.
- DecisionTreeRegressor - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Decision tree
learning algorithm
for regression.
- DecisionTreeRegressor(String) - Constructor for class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- DecisionTreeRegressor() - Constructor for class org.apache.spark.ml.regression.DecisionTreeRegressor
-
- decode(Column, String) - Static method in class org.apache.spark.sql.functions
-
Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
- decodeLabel(Vector) - Static method in class org.apache.spark.ml.classification.LabelConverter
-
Converts a vector to a label.
- defaultAttr() - Static method in class org.apache.spark.ml.attribute.BinaryAttribute
-
The default binary attribute.
- defaultAttr() - Static method in class org.apache.spark.ml.attribute.NominalAttribute
-
The default nominal attribute.
- defaultAttr() - Static method in class org.apache.spark.ml.attribute.NumericAttribute
-
The default numeric attribute.
- defaultClassLoader() - Method in class org.apache.spark.serializer.Serializer
-
Default ClassLoader to use in deserialization.
- defaultCopy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
-
Default implementation of copy with extra params.
- defaultMinPartitions() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Default min number of partitions for Hadoop RDDs when not given by user
- defaultMinPartitions() - Method in class org.apache.spark.SparkContext
-
Default min number of partitions for Hadoop RDDs when not given by user
Notice that we use math.min so the "defaultMinPartitions" cannot be higher than 2.
- defaultMinSplits() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- defaultMinSplits() - Method in class org.apache.spark.SparkContext
-
Default min number of partitions for Hadoop RDDs when not given by user
- defaultParallelism() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Default level of parallelism to use when not given by user (e.g.
- defaultParallelism() - Method in class org.apache.spark.SparkContext
-
Default level of parallelism to use when not given by user (e.g.
- defaultParamMap() - Method in interface org.apache.spark.ml.param.Params
-
Internal param map for default values.
- defaultParams(String) - Static method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- defaultParams(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- defaultPartitioner(RDD<?>, Seq<RDD<?>>) - Static method in class org.apache.spark.Partitioner
-
Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
- defaultSize() - Method in class org.apache.spark.sql.types.ArrayType
-
The default size of a value of the ArrayType is 100 * the default size of the element type.
- defaultSize() - Method in class org.apache.spark.sql.types.BinaryType
-
The default size of a value of the BinaryType is 4096 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.BooleanType
-
The default size of a value of the BooleanType is 1 byte.
- defaultSize() - Method in class org.apache.spark.sql.types.ByteType
-
The default size of a value of the ByteType is 1 byte.
- defaultSize() - Method in class org.apache.spark.sql.types.CalendarIntervalType
-
- defaultSize() - Method in class org.apache.spark.sql.types.DataType
-
The default size of a value of this data type, used internally for size estimation.
- defaultSize() - Method in class org.apache.spark.sql.types.DateType
-
The default size of a value of the DateType is 4 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.DecimalType
-
The default size of a value of the DecimalType is 4096 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.DoubleType
-
The default size of a value of the DoubleType is 8 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.FloatType
-
The default size of a value of the FloatType is 4 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.IntegerType
-
The default size of a value of the IntegerType is 4 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.LongType
-
The default size of a value of the LongType is 8 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.MapType
-
The default size of a value of the MapType is
100 * (the default size of the key type + the default size of the value type).
- defaultSize() - Method in class org.apache.spark.sql.types.NullType
-
- defaultSize() - Method in class org.apache.spark.sql.types.ShortType
-
The default size of a value of the ShortType is 2 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.StringType
-
The default size of a value of the StringType is 4096 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.StructType
-
The default size of a value of the StructType is the total default sizes of all field types.
- defaultSize() - Method in class org.apache.spark.sql.types.TimestampType
-
The default size of a value of the TimestampType is 8 bytes.
- defaultSize() - Method in class org.apache.spark.sql.types.UserDefinedType
-
The default size of a value of the UserDefinedType is 4096 bytes.
- DefaultSource - Class in org.apache.spark.ml.source.libsvm
-
libsvm
package implements Spark SQL data source API for loading LIBSVM data as DataFrame
.
- DefaultSource() - Constructor for class org.apache.spark.ml.source.libsvm.DefaultSource
-
- defaultStategy(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- defaultStrategy(String) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- defaultStrategy(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- defaultStrategy() - Static method in class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
-
- degree() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
The polynomial degree to expand, which should be >= 1.
- degrees() - Method in class org.apache.spark.graphx.GraphOps
-
The degree of each vertex in the graph.
- degreesOfFreedom() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
-
- degreesOfFreedom() - Method in class org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult
-
- degreesOfFreedom() - Method in interface org.apache.spark.mllib.stat.test.TestResult
-
Returns the degree(s) of freedom of the hypothesis test.
- delegate() - Method in class org.apache.spark.InterruptibleIterator
-
- dense(int, int, double[]) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Creates a column-major dense matrix.
- dense(double, double...) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a dense vector from its values.
- dense(double, Seq<Object>) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a dense vector from its values.
- dense(double[]) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Creates a dense vector from a double array.
- dense_rank() - Static method in class org.apache.spark.sql.functions
-
Window function: returns the rank of rows within a window partition, without any gaps.
- DenseMatrix - Class in org.apache.spark.mllib.linalg
-
Column-major dense matrix.
- DenseMatrix(int, int, double[], boolean) - Constructor for class org.apache.spark.mllib.linalg.DenseMatrix
-
- DenseMatrix(int, int, double[]) - Constructor for class org.apache.spark.mllib.linalg.DenseMatrix
-
Column-major dense matrix.
- denseRank() - Static method in class org.apache.spark.sql.functions
-
Deprecated.
As of 1.6.0, replaced by dense_rank
. This will be removed in Spark 2.0.
- DenseVector - Class in org.apache.spark.mllib.linalg
-
A dense vector represented by a value array.
- DenseVector(double[]) - Constructor for class org.apache.spark.mllib.linalg.DenseVector
-
- dependencies() - Method in class org.apache.spark.rdd.RDD
-
Get the list of dependencies of this RDD, taking into account whether the
RDD is checkpointed or not.
- dependencies() - Method in class org.apache.spark.streaming.dstream.DStream
-
List of parent DStreams on which this DStream depends on
- dependencies() - Method in class org.apache.spark.streaming.dstream.InputDStream
-
- Dependency<T> - Class in org.apache.spark
-
:: DeveloperApi ::
Base class for dependencies.
- Dependency() - Constructor for class org.apache.spark.Dependency
-
- depth() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
Get depth of tree.
- DerbyDialect - Class in org.apache.spark.sql.jdbc
-
- DerbyDialect() - Constructor for class org.apache.spark.sql.jdbc.DerbyDialect
-
- desc() - Method in class org.apache.spark.sql.Column
-
Returns an ordering used in sorting.
- desc(String) - Static method in class org.apache.spark.sql.functions
-
Returns a sort expression based on the descending order of the column.
- desc() - Method in class org.apache.spark.util.MethodIdentifier
-
- describe(String...) - Method in class org.apache.spark.sql.DataFrame
-
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
- describe(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
- describeTopics(int) - Method in class org.apache.spark.ml.clustering.LDAModel
-
Return the topics described by their top-weighted terms.
- describeTopics() - Method in class org.apache.spark.ml.clustering.LDAModel
-
- describeTopics(int) - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- describeTopics(int) - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Return the topics described by weighted terms.
- describeTopics() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Return the topics described by weighted terms.
- describeTopics(int) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- description() - Method in class org.apache.spark.ExceptionFailure
-
- description() - Method in class org.apache.spark.status.api.v1.JobData
-
- description() - Method in class org.apache.spark.storage.StorageLevel
-
- description() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
-
- DeserializationStream - Class in org.apache.spark.serializer
-
:: DeveloperApi ::
A stream for reading serialized objects.
- DeserializationStream() - Constructor for class org.apache.spark.serializer.DeserializationStream
-
- deserialize(Object) - Method in class org.apache.spark.mllib.linalg.VectorUDT
-
- deserialize(ByteBuffer, ClassLoader, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- deserialize(ByteBuffer, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- deserialize(ByteBuffer, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
-
- deserialize(ByteBuffer, ClassLoader, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
-
- deserialize(Object) - Method in class org.apache.spark.sql.types.UserDefinedType
-
Convert a SQL datum to the user type
- deserialized() - Method in class org.apache.spark.storage.MemoryEntry
-
- deserialized() - Method in class org.apache.spark.storage.StorageLevel
-
- deserializeStream(InputStream) - Method in class org.apache.spark.serializer.DummySerializerInstance
-
- deserializeStream(InputStream) - Method in class org.apache.spark.serializer.SerializerInstance
-
- destroy() - Method in class org.apache.spark.broadcast.Broadcast
-
Destroy all data and metadata related to this broadcast variable.
- details() - Method in class org.apache.spark.scheduler.StageInfo
-
- details() - Method in class org.apache.spark.status.api.v1.StageData
-
- determineBounds(ArrayBuffer<Tuple2<K, Object>>, int, Ordering<K>, ClassTag<K>) - Static method in class org.apache.spark.RangePartitioner
-
Determines the bounds for range partitioning from candidates with weights indicating how many
items each represents.
- deterministic() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Returns true iff this function is deterministic, i.e.
- DeveloperApi - Annotation Type in org.apache.spark.annotation
-
A lower-level, unstable API intended for developers.
- devianceResiduals() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
-
The weighted residuals, the usual residuals rescaled by
the square root of the instance weights.
- diag(Vector) - Static method in class org.apache.spark.mllib.linalg.DenseMatrix
-
Generate a diagonal matrix in DenseMatrix
format from the supplied values.
- diag(Vector) - Static method in class org.apache.spark.mllib.linalg.Matrices
-
Generate a diagonal matrix in Matrix
format from the supplied values.
- dialectClassName() - Method in class org.apache.spark.sql.SQLContext
-
- diff(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- diff(VertexRDD<VD>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- diff(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.VertexRDD
-
For each vertex present in both this
and other
, diff
returns only those vertices with
differing values; for values that are different, keeps the values from other
.
- diff(VertexRDD<VD>) - Method in class org.apache.spark.graphx.VertexRDD
-
For each vertex present in both this
and other
, diff
returns only those vertices with
differing values; for values that are different, keeps the values from other
.
- disableOutputSpecValidation() - Static method in class org.apache.spark.rdd.PairRDDFunctions
-
- disconnect() - Method in interface org.apache.spark.launcher.SparkAppHandle
-
Disconnects the handle from the application, without stopping it.
- DISK_ONLY - Static variable in class org.apache.spark.api.java.StorageLevels
-
- DISK_ONLY() - Static method in class org.apache.spark.storage.StorageLevel
-
- DISK_ONLY_2 - Static variable in class org.apache.spark.api.java.StorageLevels
-
- DISK_ONLY_2() - Static method in class org.apache.spark.storage.StorageLevel
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.StageData
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
-
- diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetrics
-
- diskSize() - Method in class org.apache.spark.storage.BlockStatus
-
- diskSize() - Method in class org.apache.spark.storage.BlockUpdatedInfo
-
- diskSize() - Method in class org.apache.spark.storage.RDDInfo
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
-
- diskUsed() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
-
- diskUsed() - Method in class org.apache.spark.storage.StorageStatus
-
Return the disk space used by this block manager.
- diskUsedByRdd(int) - Method in class org.apache.spark.storage.StorageStatus
-
Return the disk space used by the given RDD in this block manager in O(1) time.
- dist(Vector) - Method in class org.apache.spark.util.Vector
-
- distinct() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing the distinct elements in this RDD.
- distinct() - Method in class org.apache.spark.sql.DataFrame
-
- distinct() - Method in class org.apache.spark.sql.Dataset
-
Returns a new
Dataset
that contains only the unique elements of this
Dataset
.
- distinct(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using the distinct values of the given
Column
s as input arguments.
- distinct(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
-
Creates a Column
for this UDAF using the distinct values of the given
Column
s as input arguments.
- DistributedLDAModel - Class in org.apache.spark.ml.clustering
-
:: Experimental ::
- DistributedLDAModel - Class in org.apache.spark.mllib.clustering
-
- DistributedMatrix - Interface in org.apache.spark.mllib.linalg.distributed
-
Represents a distributively stored matrix backed by one or more RDDs.
- div(Duration) - Method in class org.apache.spark.streaming.Duration
-
- divide(Object) - Method in class org.apache.spark.sql.Column
-
Division this expression by another expression.
- divide(double) - Method in class org.apache.spark.util.Vector
-
- doc() - Method in class org.apache.spark.ml.param.Param
-
- docConcentration() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- docConcentration() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
- docConcentration() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- docConcentration() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- doDestroy(boolean) - Method in class org.apache.spark.broadcast.Broadcast
-
Actually destroy all data and metadata related to this broadcast variable.
- dot(Vector) - Method in class org.apache.spark.util.Vector
-
- DOUBLE() - Static method in class org.apache.spark.sql.Encoders
-
An encoder for nullable double type.
- doubleAccumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- doubleAccumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Create an
Accumulator
double variable, which tasks can "add" values
to using the
add
method.
- DoubleArrayParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Array[Double
} for Java.
- DoubleArrayParam(Params, String, String, Function1<double[], Object>) - Constructor for class org.apache.spark.ml.param.DoubleArrayParam
-
- DoubleArrayParam(Params, String, String) - Constructor for class org.apache.spark.ml.param.DoubleArrayParam
-
- DoubleDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
-
- DoubleFlatMapFunction<T> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more records of type Double from each input record.
- DoubleFunction<T> - Interface in org.apache.spark.api.java.function
-
A function that returns Doubles, and can be used to construct DoubleRDDs.
- DoubleParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Double
] for Java.
- DoubleParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleParam(String, String, String) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.DoubleParam
-
- DoubleRDDFunctions - Class in org.apache.spark.rdd
-
Extra functions available on RDDs of Doubles through an implicit conversion.
- DoubleRDDFunctions(RDD<Object>) - Constructor for class org.apache.spark.rdd.DoubleRDDFunctions
-
- doubleRDDToDoubleRDDFunctions(RDD<Object>) - Static method in class org.apache.spark.rdd.RDD
-
- doubleRDDToDoubleRDDFunctions(RDD<Object>) - Static method in class org.apache.spark.SparkContext
-
- doubleToDoubleWritable(double) - Static method in class org.apache.spark.SparkContext
-
- doubleToMultiplier(double) - Static method in class org.apache.spark.util.Vector
-
- DoubleType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the DoubleType object.
- DoubleType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing Double
values.
- doubleWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- doUnpersist(boolean) - Method in class org.apache.spark.broadcast.Broadcast
-
Actually unpersist the broadcasted value on the executors.
- DRIVER_EXTRA_CLASSPATH - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver class path.
- DRIVER_EXTRA_JAVA_OPTIONS - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver VM options.
- DRIVER_EXTRA_LIBRARY_PATH - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver native library path.
- DRIVER_IDENTIFIER() - Static method in class org.apache.spark.SparkContext
-
Executor id for the driver.
- DRIVER_MEMORY - Static variable in class org.apache.spark.launcher.SparkLauncher
-
Configuration key for the driver memory.
- driverActorSystemName() - Static method in class org.apache.spark.SparkEnv
-
- driverLogs() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
-
- drop(String) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with a column dropped.
- drop(Column) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with a column dropped.
- drop() - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing any null or NaN values.
- drop(String) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing null or NaN values.
- drop(String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing any null or NaN values
in the specified columns.
- drop(Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that drops rows containing any null or NaN values
in the specified columns.
- drop(String, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing null or NaN values
in the specified columns.
- drop(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that drops rows containing null or NaN values
in the specified columns.
- drop(int) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing
less than
minNonNulls
non-null and non-NaN values.
- drop(int, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that drops rows containing
less than
minNonNulls
non-null and non-NaN values in the specified columns.
- drop(int, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that drops rows containing less than
minNonNulls
non-null and non-NaN values in the specified columns.
- dropDuplicates() - Method in class org.apache.spark.sql.DataFrame
-
- dropDuplicates(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
-
(Scala-specific) Returns a new
DataFrame
with duplicate rows removed, considering only
the subset of columns.
- dropDuplicates(String[]) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new
DataFrame
with duplicate rows removed, considering only
the subset of columns.
- dropLast() - Method in class org.apache.spark.ml.feature.OneHotEncoder
-
Whether to drop the last category in the encoded vector (default: true)
- dropTempTable(String) - Method in class org.apache.spark.sql.SQLContext
-
- Dst - Static variable in class org.apache.spark.graphx.TripletFields
-
Expose the destination and edge fields but not the source field.
- dstAttr() - Method in class org.apache.spark.graphx.EdgeContext
-
The vertex attribute of the edge's destination vertex.
- dstAttr() - Method in class org.apache.spark.graphx.EdgeTriplet
-
The destination vertex attribute
- dstAttr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- dstId() - Method in class org.apache.spark.graphx.Edge
-
- dstId() - Method in class org.apache.spark.graphx.EdgeContext
-
The vertex id of the edge's destination vertex.
- dstId() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
-
- dstream() - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
- dstream() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
- dstream() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- DStream<T> - Class in org.apache.spark.streaming.dstream
-
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous
sequence of RDDs (of the same type) representing a continuous stream of data (see
org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
- DStream(StreamingContext, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.DStream
-
- dtypes() - Method in class org.apache.spark.sql.DataFrame
-
Returns all column names and their data types as an array.
- DummySerializerInstance - Class in org.apache.spark.serializer
-
Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
- duration() - Method in class org.apache.spark.scheduler.TaskInfo
-
- Duration - Class in org.apache.spark.streaming
-
- Duration(long) - Constructor for class org.apache.spark.streaming.Duration
-
- duration() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
-
Return the duration of this output operation.
- Durations - Class in org.apache.spark.streaming
-
- Durations() - Constructor for class org.apache.spark.streaming.Durations
-
- f() - Method in class org.apache.spark.sql.UserDefinedFunction
-
- f1Measure() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns document-based f1-measure averaged by the number of documents
- f1Measure(double) - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
-
Returns f1-measure for a given label (category)
- factorial(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the factorial of the given value.
- failed() - Method in class org.apache.spark.scheduler.TaskInfo
-
- failedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- failedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
-
- failedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
-
- failedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
-
- failureReason() - Method in class org.apache.spark.scheduler.StageInfo
-
If the stage failed, the reason why.
- failureReason() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
-
- FAIR() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- falsePositiveRate(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns false positive rate for a given label (category)
- feature() - Method in class org.apache.spark.mllib.tree.model.Split
-
- featureImportances() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
Estimate of the importance of each feature.
- featureImportances() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
Estimate of the importance of each feature.
- featureIndex() - Method in class org.apache.spark.ml.tree.CategoricalSplit
-
- featureIndex() - Method in class org.apache.spark.ml.tree.ContinuousSplit
-
- featureIndex() - Method in interface org.apache.spark.ml.tree.Split
-
Index of feature which this split tests
- features() - Method in class org.apache.spark.mllib.regression.LabeledPoint
-
- featuresCol() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
- featuresCol() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
-
Field in "predictions" which gives the features of each instance as a vector.
- featuresCol() - Method in class org.apache.spark.ml.regression.LinearRegressionTrainingSummary
-
- featuresDataType() - Method in class org.apache.spark.ml.PredictionModel
-
Returns the SQL DataType corresponding to the FeaturesType type parameter.
- FeatureType - Class in org.apache.spark.mllib.tree.configuration
-
Enum to describe whether a feature is "continuous" or "categorical"
- FeatureType() - Constructor for class org.apache.spark.mllib.tree.configuration.FeatureType
-
- featureType() - Method in class org.apache.spark.mllib.tree.model.Split
-
- FetchFailed - Class in org.apache.spark
-
:: DeveloperApi ::
Task failed to fetch shuffle data from a remote node.
- FetchFailed(BlockManagerId, int, int, int, String) - Constructor for class org.apache.spark.FetchFailed
-
- fetchPct() - Method in class org.apache.spark.scheduler.RuntimePercentage
-
- fetchWaitTime() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
-
- fetchWaitTime() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
-
- field() - Method in class org.apache.spark.storage.BroadcastBlockId
-
- fieldIndex(String) - Method in interface org.apache.spark.sql.Row
-
Returns the index of a given field name.
- fieldIndex(String) - Method in class org.apache.spark.sql.types.StructType
-
Returns index of a given field
- fieldNames() - Method in class org.apache.spark.sql.types.StructType
-
Returns all field names in an array.
- fields() - Method in class org.apache.spark.sql.types.StructType
-
- FIFO() - Static method in class org.apache.spark.scheduler.SchedulingMode
-
- files() - Method in class org.apache.spark.SparkContext
-
- fileStream(String, Class<K>, Class<V>, Class<F>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path, Boolean>, boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path, Boolean>, boolean, Configuration) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Create an input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Function1<Path, Object>, boolean, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fileStream(String, Function1<Path, Object>, boolean, Configuration, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
-
Create a input stream that monitors a Hadoop-compatible filesystem
for new files and reads them using the given key-value types and input format.
- fill(double) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null or NaN values in numeric columns with
value
.
- fill(String) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null values in string columns with
value
.
- fill(double, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null or NaN values in specified numeric columns.
- fill(double, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that replaces null or NaN values in specified
numeric columns.
- fill(String, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null values in specified string columns.
- fill(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that replaces null values in
specified string columns.
- fill(Map<String, Object>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
Returns a new
DataFrame
that replaces null values.
- fill(Map<String, Object>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
-
(Scala-specific) Returns a new
DataFrame
that replaces null values.
- filter(Function<Double, Boolean>) - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function<T, Boolean>) - Method in class org.apache.spark.api.java.JavaRDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Function1<Graph<VD, ED>, Graph<VD2, ED2>>, Function1<EdgeTriplet<VD2, ED2>, Object>, Function2<Object, VD2, Object>, ClassTag<VD2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.GraphOps
-
Filter the graph by computing some values to filter on, and applying the predicates.
- filter(Function1<EdgeTriplet<VD, ED>, Object>, Function2<Object, VD, Object>) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- filter(Function1<Tuple2<Object, VD>, Object>) - Method in class org.apache.spark.graphx.VertexRDD
-
Restricts the vertex set to the set of vertices satisfying the given predicate.
- filter(Params) - Method in class org.apache.spark.ml.param.ParamMap
-
Filters this param map for the given parent.
- filter(Function1<T, Object>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD containing only the elements that satisfy a predicate.
- filter(Column) - Method in class org.apache.spark.sql.DataFrame
-
Filters rows using the given condition.
- filter(String) - Method in class org.apache.spark.sql.DataFrame
-
Filters rows using the given SQL expression.
- filter(Function1<T, Object>) - Method in class org.apache.spark.sql.Dataset
-
(Scala-specific)
Returns a new
Dataset
that only contains elements where
func
returns
true
.
- filter(FilterFunction<T>) - Method in class org.apache.spark.sql.Dataset
-
(Java-specific)
Returns a new
Dataset
that only contains elements where
func
returns
true
.
- Filter - Class in org.apache.spark.sql.sources
-
A filter predicate for data sources.
- Filter() - Constructor for class org.apache.spark.sql.sources.Filter
-
- filter(Function<T, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filter(Function1<T, Object>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream containing only the elements that satisfy a predicate.
- filterByRange(K, K) - Method in class org.apache.spark.rdd.OrderedRDDFunctions
-
Returns an RDD containing only the elements in the the inclusive range lower
to upper
.
- FilterFunction<T> - Interface in org.apache.spark.api.java.function
-
Base interface for a function used in Dataset's filter function.
- filterWith(Function1<Object, A>, Function2<T, A, Object>) - Method in class org.apache.spark.rdd.RDD
-
Filters this RDD with p, where p takes an additional parameter of type A.
- findSplitsBins(RDD<LabeledPoint>, org.apache.spark.mllib.tree.impl.DecisionTreeMetadata) - Static method in class org.apache.spark.mllib.tree.DecisionTree
-
Returns splits and bins for decision tree calculation.
- findSynonyms(String, int) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Find "num" number of words closest in similarity to the given word.
- findSynonyms(Vector, int) - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Find "num" number of words closest to similarity to the given vector representation
of the word.
- findSynonyms(String, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
- findSynonyms(Vector, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
- finish(B) - Method in class org.apache.spark.sql.expressions.Aggregator
-
Transform the output of the reduction.
- finished() - Method in class org.apache.spark.scheduler.TaskInfo
-
- finishTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task has completed successfully (including the time to remotely fetch
results, if necessary).
- first() - Method in class org.apache.spark.api.java.JavaDoubleRDD
-
- first() - Method in class org.apache.spark.api.java.JavaPairRDD
-
- first() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the first element in this RDD.
- first() - Method in class org.apache.spark.rdd.RDD
-
Return the first element in this RDD.
- first() - Method in class org.apache.spark.sql.DataFrame
-
Returns the first row.
- first() - Method in class org.apache.spark.sql.Dataset
-
Returns the first element in this
Dataset
.
- first(Column) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the first value in a group.
- first(String) - Static method in class org.apache.spark.sql.functions
-
Aggregate function: returns the first value of a column in a group.
- firstParent(ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Returns the first parent RDD
- fit(DataFrame) - Method in class org.apache.spark.ml.classification.OneVsRest
-
- fit(DataFrame) - Method in class org.apache.spark.ml.clustering.KMeans
-
- fit(DataFrame) - Method in class org.apache.spark.ml.clustering.LDA
-
- fit(DataFrame, ParamPair<?>, ParamPair<?>...) - Method in class org.apache.spark.ml.Estimator
-
Fits a single model to the input data with optional parameters.
- fit(DataFrame, ParamPair<?>, Seq<ParamPair<?>>) - Method in class org.apache.spark.ml.Estimator
-
Fits a single model to the input data with optional parameters.
- fit(DataFrame, ParamMap) - Method in class org.apache.spark.ml.Estimator
-
Fits a single model to the input data with provided parameter map.
- fit(DataFrame) - Method in class org.apache.spark.ml.Estimator
-
Fits a model to the input data.
- fit(DataFrame, ParamMap[]) - Method in class org.apache.spark.ml.Estimator
-
Fits multiple models to the input data with multiple sets of parameters.
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.ChiSqSelector
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.CountVectorizer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.IDF
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.MinMaxScaler
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.PCA
-
Computes a
PCAModel
that contains the principal components of the input vectors.
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.QuantileDiscretizer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.RFormula
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.StandardScaler
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.StringIndexer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.VectorIndexer
-
- fit(DataFrame) - Method in class org.apache.spark.ml.feature.Word2Vec
-
- fit(DataFrame) - Method in class org.apache.spark.ml.Pipeline
-
Fits the pipeline to the input dataset with additional parameters.
- fit(DataFrame) - Method in class org.apache.spark.ml.Predictor
-
- fit(DataFrame) - Method in class org.apache.spark.ml.recommendation.ALS
-
- fit(DataFrame) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegression
-
- fit(DataFrame) - Method in class org.apache.spark.ml.regression.IsotonicRegression
-
- fit(DataFrame) - Method in class org.apache.spark.ml.tuning.CrossValidator
-
- fit(DataFrame) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
-
- fit(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.feature.ChiSqSelector
-
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
-
Computes the inverse document frequency.
- fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
-
Computes the inverse document frequency.
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.PCA
-
Computes a
PCAModel
that contains the principal components of the input vectors.
- fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.PCA
-
Java-friendly version of fit()
- fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.StandardScaler
-
Computes the mean and variance and stores as a model to be used for later scaling.
- fit(RDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
- fit(JavaRDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
-
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(Function1<T, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMap(Function1<Row, TraversableOnce<R>>, ClassTag<R>) - Method in class org.apache.spark.sql.DataFrame
-
Returns a new RDD by first applying a function to all rows of this
DataFrame
,
and then flattening the results.
- flatMap(Function1<T, TraversableOnce<U>>, Encoder<U>) - Method in class org.apache.spark.sql.Dataset
-
(Scala-specific)
Returns a new
Dataset
by first applying a function to all elements of this
Dataset
,
and then flattening the results.
- flatMap(FlatMapFunction<T, U>, Encoder<U>) - Method in class org.apache.spark.sql.Dataset
-
(Java-specific)
Returns a new
Dataset
by first applying a function to all elements of this
Dataset
,
and then flattening the results.
- flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMap(Function1<T, Traversable<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- FlatMapFunction<T,R> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more output records from each input record.
- FlatMapFunction2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A function that takes two inputs and returns zero or more output records.
- flatMapGroups(Function2<K, Iterator<V>, TraversableOnce<U>>, Encoder<U>) - Method in class org.apache.spark.sql.GroupedDataset
-
Applies the given function to each group of data.
- flatMapGroups(FlatMapGroupsFunction<K, V, U>, Encoder<U>) - Method in class org.apache.spark.sql.GroupedDataset
-
Applies the given function to each group of data.
- FlatMapGroupsFunction<K,V,R> - Interface in org.apache.spark.api.java.function
-
A function that returns zero or more output records from each grouping key and its values.
- flatMapToDouble(DoubleFlatMapFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return a new RDD by first applying a function to all elements of this
RDD, and then flattening the results.
- flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream by applying a function to all elements of this DStream,
and then flattening the results
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function1<V, TraversableOnce<U>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Pass each value in the key-value pair RDD through a flatMap function without changing the
keys; this also retains the original RDD's partitioning.
- flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapValues(Function1<V, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying a flatmap function to the value of each key-value pairs in
'this' DStream without changing the key.
- flatMapWith(Function1<Object, A>, boolean, Function2<T, A, Seq<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
-
FlatMaps f over this RDD, where f takes an additional parameter of type A.
- FLOAT() - Static method in class org.apache.spark.sql.Encoders
-
An encoder for nullable float type.
- FloatDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
-
- FloatParam - Class in org.apache.spark.ml.param
-
:: DeveloperApi ::
Specialized version of Param[Float
] for Java.
- FloatParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- FloatParam(String, String, String) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- FloatParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- FloatParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.FloatParam
-
- floatToFloatWritable(float) - Static method in class org.apache.spark.SparkContext
-
- FloatType - Static variable in class org.apache.spark.sql.types.DataTypes
-
Gets the FloatType object.
- FloatType - Class in org.apache.spark.sql.types
-
:: DeveloperApi ::
The data type representing Float
values.
- floatWritableConverter() - Static method in class org.apache.spark.SparkContext
-
- floor(Column) - Static method in class org.apache.spark.sql.functions
-
Computes the floor of the given value.
- floor(String) - Static method in class org.apache.spark.sql.functions
-
Computes the floor of the given column.
- floor() - Method in class org.apache.spark.sql.types.Decimal
-
- floor(Duration) - Method in class org.apache.spark.streaming.Time
-
- floor(Duration, Time) - Method in class org.apache.spark.streaming.Time
-
- FlumeUtils - Class in org.apache.spark.streaming.flume
-
- FlumeUtils() - Constructor for class org.apache.spark.streaming.flume.FlumeUtils
-
- flush() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
-
- flush() - Method in class org.apache.spark.serializer.SerializationStream
-
- flush() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
-
- fMeasure(double, double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f-measure for a given label (category)
- fMeasure(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f1-measure for a given label (category)
- fMeasure() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
-
Returns f-measure
(equals to precision and recall because precision equals recall)
- fMeasureByThreshold() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
-
Returns a dataframe with two fields (threshold, F-Measure) curve with beta = 1.0.
- fMeasureByThreshold(double) - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve.
- fMeasureByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-
Returns the (threshold, F-Measure) curve with beta = 1.0.
- fold(T, Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative and commutative function and a neutral "zero value".
- fold(T, Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
-
Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative and commutative function and a neutral "zero value".
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Merge the values for each key using an associative function and a neutral "zero value"
which may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Merge the values for each key using an associative function and a neutral "zero value" which
may be added to the result an arbitrary number of times, and must not change the result
(e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
- foreach(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to all elements of this RDD.
- foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to all elements of this RDD.
- foreach(Function1<Row, BoxedUnit>) - Method in class org.apache.spark.sql.DataFrame
-
Applies a function f
to all rows.
- foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.sql.Dataset
-
(Scala-specific)
Runs
func
on each element of this
Dataset
.
- foreach(ForeachFunction<T>) - Method in class org.apache.spark.sql.Dataset
-
(Java-specific)
Runs
func
on each element of this
Dataset
.
- foreach(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 0.9.0, replaced by foreachRDD
- foreach(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Deprecated.
As of 0.9.0, replaced by foreachRDD
.
- foreach(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Deprecated.
As of 0.9.0, replaced by foreachRDD
.
- foreachActive(Function2<Object, Object, BoxedUnit>) - Method in class org.apache.spark.mllib.linalg.DenseVector
-
- foreachActive(Function3<Object, Object, Object, BoxedUnit>) - Method in interface org.apache.spark.mllib.linalg.Matrix
-
Applies a function f
to all the active elements of dense and sparse matrix.
- foreachActive(Function2<Object, Object, BoxedUnit>) - Method in class org.apache.spark.mllib.linalg.SparseVector
-
- foreachActive(Function2<Object, Object, BoxedUnit>) - Method in interface org.apache.spark.mllib.linalg.Vector
-
Applies a function f
to all the active elements of dense and sparse vector.
- foreachAsync(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of the foreach
action, which
applies a function f to all the elements of this RDD.
- foreachAsync(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to all elements of this RDD.
- ForeachFunction<T> - Interface in org.apache.spark.api.java.function
-
Base interface for a function used in Dataset's foreach function.
- foreachPartition(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Applies a function f to each partition of this RDD.
- foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies a function f to each partition of this RDD.
- foreachPartition(Function1<Iterator<Row>, BoxedUnit>) - Method in class org.apache.spark.sql.DataFrame
-
Applies a function f to each partition of this
DataFrame
.
- foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.sql.Dataset
-
(Scala-specific)
Runs
func
on each partition of this
Dataset
.
- foreachPartition(ForeachPartitionFunction<T>) - Method in class org.apache.spark.sql.Dataset
-
(Java-specific)
Runs
func
on each partition of this
Dataset
.
- foreachPartitionAsync(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
-
The asynchronous version of the foreachPartition
action, which
applies a function f to each partition of this RDD.
- foreachPartitionAsync(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
-
Applies a function f to each partition of this RDD.
- ForeachPartitionFunction<T> - Interface in org.apache.spark.api.java.function
-
Base interface for a function used in Dataset's foreachPartition function.
- foreachRDD(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 1.6.0, replaced by foreachRDD(JVoidFunction)
- foreachRDD(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Deprecated.
As of release 1.6.0, replaced by foreachRDD(JVoidFunction2)
- foreachRDD(VoidFunction<R>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(VoidFunction2<R, Time>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachRDD(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
-
Apply a function to each RDD in this DStream.
- foreachWith(Function1<Object, A>, Function2<T, A, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
-
Applies f to each element of this RDD, where f takes an additional parameter of type A.
- format(String) - Method in class org.apache.spark.sql.DataFrameReader
-
Specifies the input data source format.
- format(String) - Method in class org.apache.spark.sql.DataFrameWriter
-
Specifies the underlying output data source.
- format_number(Column, int) - Static method in class org.apache.spark.sql.functions
-
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places,
and returns the result as a string column.
- format_string(String, Column...) - Static method in class org.apache.spark.sql.functions
-
Formats the arguments in printf-style and returns the result as a string column.
- format_string(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
-
Formats the arguments in printf-style and returns the result as a string column.
- formatVersion() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
-
- formatVersion() - Method in class org.apache.spark.mllib.classification.SVMModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.KMeansModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- formatVersion() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
-
- formatVersion() - Method in class org.apache.spark.mllib.feature.ChiSqSelectorModel
-
- formatVersion() - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
- formatVersion() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.LassoModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.LinearRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.regression.RidgeRegressionModel
-
- formatVersion() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
-
- formatVersion() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- formatVersion() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
-
- formatVersion() - Method in interface org.apache.spark.mllib.util.Saveable
-
Current version of model save/load format.
- formula() - Method in class org.apache.spark.ml.feature.RFormula
-
R formula parameter.
- FPGrowth - Class in org.apache.spark.mllib.fpm
-
A parallel FP-growth algorithm to mine frequent itemsets.
- FPGrowth() - Constructor for class org.apache.spark.mllib.fpm.FPGrowth
-
Constructs a default instance with default parameters {minSupport: 0.3
, numPartitions: same
as the input data}.
- FPGrowth.FreqItemset<Item> - Class in org.apache.spark.mllib.fpm
-
Frequent itemset.
- FPGrowth.FreqItemset(Object, long) - Constructor for class org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
-
- FPGrowthModel<Item> - Class in org.apache.spark.mllib.fpm
-
Model trained by
FPGrowth
, which holds frequent itemsets.
- FPGrowthModel(RDD<FPGrowth.FreqItemset<Item>>, ClassTag<Item>) - Constructor for class org.apache.spark.mllib.fpm.FPGrowthModel
-
- fractional() - Method in class org.apache.spark.sql.types.DecimalType
-
- fractional() - Method in class org.apache.spark.sql.types.DoubleType
-
- fractional() - Method in class org.apache.spark.sql.types.FloatType
-
- freq() - Method in class org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
-
- freq() - Method in class org.apache.spark.mllib.fpm.PrefixSpan.FreqSequence
-
- freqItems(String[], double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Finding frequent items for columns, possibly with false positives.
- freqItems(String[]) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
Finding frequent items for columns, possibly with false positives.
- freqItems(Seq<String>, double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
(Scala-specific) Finding frequent items for columns, possibly with false positives.
- freqItems(Seq<String>) - Method in class org.apache.spark.sql.DataFrameStatFunctions
-
(Scala-specific) Finding frequent items for columns, possibly with false positives.
- freqItemsets() - Method in class org.apache.spark.mllib.fpm.FPGrowthModel
-
- freqSequences() - Method in class org.apache.spark.mllib.fpm.PrefixSpanModel
-
- from_unixtime(Column) - Static method in class org.apache.spark.sql.functions
-
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
- from_unixtime(Column, String) - Static method in class org.apache.spark.sql.functions
-
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
- from_utc_timestamp(Column, String) - Static method in class org.apache.spark.sql.functions
-
Assumes given timestamp is UTC and converts to given timezone.
- fromAttributes(Seq<Attribute>) - Static method in class org.apache.spark.sql.types.StructType
-
- fromAvroFlumeEvent(AvroFlumeEvent) - Static method in class org.apache.spark.streaming.flume.SparkFlumeEvent
-
- fromCaseClassString(String) - Static method in class org.apache.spark.sql.types.DataType
-
Deprecated.
As of 1.2.0, replaced by DataType.fromJson()
- fromCOO(int, int, Iterable<Tuple3<Object, Object, Object>>) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
-
Generate a SparseMatrix
from Coordinate List (COO) format.
- fromDStream(DStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaDStream
-
- fromEdgePartitions(RDD<Tuple2<Object, EdgePartition<ED, VD>>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from EdgePartitions, setting referenced vertices to `defaultVertexAttr`.
- fromEdges(RDD<Edge<ED>>, ClassTag<ED>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.EdgeRDD
-
Creates an EdgeRDD from a set of edges.
- fromEdges(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
-
Construct a graph from a collection of edges.
- fromEdges(EdgeRDD<?>, int, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
-
Constructs a VertexRDD
containing all vertices referred to in edges
.
- fromEdgeTuples(RDD<Tuple2<Object, Object>>, VD, Option<PartitionStrategy>, StorageLevel, StorageLevel, ClassTag<VD>) - Static method in class org.apache.spark.graphx.Graph
-
Construct a graph from a collection of edges encoded as vertex id pairs.
- fromExistingRDDs(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
-
Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the
vertices.
- fromInputDStream(InputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaInputDStream
-
- fromInputDStream(InputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairInputDStream
-
- fromJavaDStream(JavaDStream<Tuple2<K, V>>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromJavaRDD(JavaRDD<Tuple2<K, V>>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
Convert a JavaRDD of key-value pairs to JavaPairRDD.
- fromJson(String) - Static method in class org.apache.spark.mllib.linalg.Vectors
-
Parses the JSON representation of a vector into a
Vector
.
- fromJson(String) - Static method in class org.apache.spark.sql.types.DataType
-
- fromJson(String) - Static method in class org.apache.spark.sql.types.Metadata
-
Creates a Metadata instance from JSON.
- fromName(String) - Static method in class org.apache.spark.ml.attribute.AttributeType
-
- fromOffset() - Method in class org.apache.spark.streaming.kafka.OffsetRange
-
- fromOld(DecisionTreeModel, DecisionTreeClassifier, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
-
(private[ml]) Convert a model from the old API
- fromOld(GradientBoostedTreesModel, GBTClassifier, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.classification.GBTClassificationModel
-
(private[ml]) Convert a model from the old API
- fromOld(RandomForestModel, RandomForestClassifier, Map<Object, Object>, int, int) - Static method in class org.apache.spark.ml.classification.RandomForestClassificationModel
-
(private[ml]) Convert a model from the old API
- fromOld(DecisionTreeModel, DecisionTreeRegressor, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
-
(private[ml]) Convert a model from the old API
- fromOld(GradientBoostedTreesModel, GBTRegressor, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.regression.GBTRegressionModel
-
(private[ml]) Convert a model from the old API
- fromOld(RandomForestModel, RandomForestRegressor, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.regression.RandomForestRegressionModel
-
(private[ml]) Convert a model from the old API
- fromOld(Node, Map<Object, Object>) - Static method in class org.apache.spark.ml.tree.Node
-
Create a new Node from the old Node format, recursively creating child nodes as needed.
- fromPairDStream(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
- fromPairRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.mllib.rdd.MLPairRDDFunctions
-
Implicit conversion from a pair RDD to MLPairRDDFunctions.
- fromRDD(RDD<Object>) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
-
- fromRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
-
- fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.api.java.JavaRDD
-
- fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.mllib.rdd.RDDFunctions
-
Implicit conversion from an RDD to RDDFunctions.
- fromRdd(RDD<?>) - Static method in class org.apache.spark.storage.RDDInfo
-
- fromReceiverInputDStream(ReceiverInputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
-
- fromReceiverInputDStream(ReceiverInputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
-
- fromSparkContext(SparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
-
- fromStage(Stage, int, Option<Object>, Seq<Seq<TaskLocation>>) - Static method in class org.apache.spark.scheduler.StageInfo
-
Construct a StageInfo from a Stage.
- fromString(String) - Static method in enum org.apache.spark.JobExecutionStatus
-
- fromString(String) - Static method in class org.apache.spark.mllib.tree.loss.Losses
-
- fromString(String) - Static method in enum org.apache.spark.status.api.v1.ApplicationStatus
-
- fromString(String) - Static method in enum org.apache.spark.status.api.v1.StageStatus
-
- fromString(String) - Static method in enum org.apache.spark.status.api.v1.TaskSorting
-
- fromString(String) - Static method in class org.apache.spark.storage.StorageLevel
-
:: DeveloperApi ::
Return the StorageLevel object with the specified name.
- fromStructField(StructField) - Static method in class org.apache.spark.ml.attribute.AttributeGroup
-
Creates an attribute group from a StructField
instance.
- fullOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a full outer join of this
and other
.
- fullOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a full outer join of this
and other
.
- fullOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
-
Perform a full outer join of this
and other
.
- fullOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a full outer join of this
and other
.
- fullOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a full outer join of this
and other
.
- fullOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
-
Perform a full outer join of this
and other
.
- fullOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
-
Return a new DStream by applying 'full outer join' between RDDs of this
DStream and
other
DStream.
- fullStackTrace() - Method in class org.apache.spark.ExceptionFailure
-
- Function<T1,R> - Interface in org.apache.spark.api.java.function
-
Base interface for functions whose return types do not create special RDDs.
- function(Function4<Time, KeyType, Option<ValueType>, State<StateType>, Option<MappedType>>) - Static method in class org.apache.spark.streaming.StateSpec
-
- function(Function3<KeyType, Option<ValueType>, State<StateType>, MappedType>) - Static method in class org.apache.spark.streaming.StateSpec
-
- function(Function4<Time, KeyType, Optional<ValueType>, State<StateType>, Optional<MappedType>>) - Static method in class org.apache.spark.streaming.StateSpec
-
- function(Function3<KeyType, Optional<ValueType>, State<StateType>, MappedType>) - Static method in class org.apache.spark.streaming.StateSpec
-
- Function0<R> - Interface in org.apache.spark.api.java.function
-
A zero-argument function that returns an R.
- Function2<T1,T2,R> - Interface in org.apache.spark.api.java.function
-
A two-argument function that takes arguments of type T1 and T2 and returns an R.
- Function3<T1,T2,T3,R> - Interface in org.apache.spark.api.java.function
-
A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.
- Function4<T1,T2,T3,T4,R> - Interface in org.apache.spark.api.java.function
-
A four-argument function that takes arguments of type T1, T2, T3 and T4 and returns an R.
- functionRegistry() - Method in class org.apache.spark.sql.hive.HiveContext
-
- functionRegistry() - Method in class org.apache.spark.sql.SQLContext
-
- functions - Class in org.apache.spark.sql
-
- functions() - Constructor for class org.apache.spark.sql.functions
-
- FutureAction<T> - Interface in org.apache.spark
-
A future for the result of an action to support cancellation.
- futureExecutionContext() - Static method in class org.apache.spark.rdd.AsyncRDDActions
-
- gain() - Method in class org.apache.spark.ml.tree.InternalNode
-
- gain() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
-
- gamma1() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- gamma2() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- gamma6() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- gamma7() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
-
- GammaGenerator - Class in org.apache.spark.mllib.random
-
:: DeveloperApi ::
Generates i.i.d.
- GammaGenerator(double, double) - Constructor for class org.apache.spark.mllib.random.GammaGenerator
-
- gammaJavaRDD(JavaSparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaRDD(JavaSparkContext, double, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaVectorRDD(JavaSparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaVectorRDD(JavaSparkContext, double, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaJavaVectorRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
- gammaRDD(SparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD comprised of i.i.d.
samples from the gamma distribution with the input
shape and scale.
- gammaShape() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- gammaShape() - Method in class org.apache.spark.mllib.clustering.LDAModel
-
Shape parameter for random initialization of variational parameter gamma.
- gammaShape() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
-
- gammaVectorRDD(SparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
-
Generates an RDD[Vector] with vectors containing i.i.d.
samples drawn from the
gamma distribution with the input shape and scale.
- gaps() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
Indicates whether regex splits on gaps (true) or matches tokens (false).
- GaussianMixture - Class in org.apache.spark.mllib.clustering
-
This class performs expectation maximization for multivariate Gaussian
Mixture Models (GMMs).
- GaussianMixture() - Constructor for class org.apache.spark.mllib.clustering.GaussianMixture
-
Constructs a default instance.
- GaussianMixtureModel - Class in org.apache.spark.mllib.clustering
-
Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points
are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are
the respective mean and covariance for each Gaussian distribution i=1..k.
- GaussianMixtureModel(double[], MultivariateGaussian[]) - Constructor for class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- gaussians() - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
-
- GBTClassificationModel - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Gradient-Boosted Trees (GBTs)
model for classification.
- GBTClassificationModel(String, DecisionTreeRegressionModel[], double[]) - Constructor for class org.apache.spark.ml.classification.GBTClassificationModel
-
Construct a GBTClassificationModel
- GBTClassifier - Class in org.apache.spark.ml.classification
-
:: Experimental ::
Gradient-Boosted Trees (GBTs)
learning algorithm for classification.
- GBTClassifier(String) - Constructor for class org.apache.spark.ml.classification.GBTClassifier
-
- GBTClassifier() - Constructor for class org.apache.spark.ml.classification.GBTClassifier
-
- GBTRegressionModel - Class in org.apache.spark.ml.regression
-
:: Experimental ::
- GBTRegressionModel(String, DecisionTreeRegressionModel[], double[]) - Constructor for class org.apache.spark.ml.regression.GBTRegressionModel
-
Construct a GBTRegressionModel
- GBTRegressor - Class in org.apache.spark.ml.regression
-
:: Experimental ::
Gradient-Boosted Trees (GBTs)
learning algorithm for regression.
- GBTRegressor(String) - Constructor for class org.apache.spark.ml.regression.GBTRegressor
-
- GBTRegressor() - Constructor for class org.apache.spark.ml.regression.GBTRegressor
-
- GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel> - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).
- GeneralizedLinearAlgorithm() - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
- GeneralizedLinearModel - Class in org.apache.spark.mllib.regression
-
:: DeveloperApi ::
GeneralizedLinearModel (GLM) represents a model trained using
GeneralizedLinearAlgorithm.
- GeneralizedLinearModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearModel
-
- generateAssociationRules(double) - Method in class org.apache.spark.mllib.fpm.FPGrowthModel
-
Generates association rules for the Item
s in freqItemsets
.
- generatedRDDs() - Method in class org.apache.spark.streaming.dstream.DStream
-
- generateKMeansRDD(SparkContext, int, int, int, double, int) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
-
Generate an RDD containing test data for KMeans.
- generateLinearInput(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
For compatibility, the generated data without specifying the mean and variance
will have zero mean and variance of (1.0/3.0) since the original output range is
[-1, 1] with uniform distribution, and the variance of uniform distribution
is (b - a)^2^ / 12 which will be (1.0/3.0)
- generateLinearInput(double, double[], double[], double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- generateLinearInput(double, double[], double[], double[], int, int, double, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
- generateLinearInputAsList(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Return a Java List of synthetic data randomly generated according to a multi
collinear model.
- generateLinearRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
-
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso,
and uregularized variants.
- generateLogisticRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
-
Generate an RDD containing test data for LogisticRegression.
- generateRandomEdges(int, int, int, long) - Static method in class org.apache.spark.graphx.util.GraphGenerators
-
- geq(Object) - Method in class org.apache.spark.sql.Column
-
Greater than or equal to an expression.
- get() - Method in interface org.apache.spark.FutureAction
-
Blocks and returns the result of this job.
- get(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
-
Optionally returns the value associated with a param.
- get(Param<T>) - Method in interface org.apache.spark.ml.param.Params
-
- get(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter; throws a NoSuchElementException if it's not set
- get(String, String) - Method in class org.apache.spark.SparkConf
-
Get a parameter, falling back to a default if not set
- get() - Static method in class org.apache.spark.SparkEnv
-
Returns the SparkEnv.
- get(String) - Static method in class org.apache.spark.SparkFiles
-
Get the absolute path of a file added through SparkContext.addFile()
.
- get(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i.
- get() - Method in class org.apache.spark.streaming.State
-
Get the state if it exists, otherwise it will throw java.util.NoSuchElementException
.
- get() - Static method in class org.apache.spark.TaskContext
-
Return the currently active TaskContext.
- get_json_object(Column, String) - Static method in class org.apache.spark.sql.functions
-
Extracts json object from a json string based on json path specified, and returns json string
of the extracted json object.
- getActive() - Static method in class org.apache.spark.streaming.StreamingContext
-
:: Experimental ::
- getActiveJobIds() - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns an array containing the ids of all active jobs.
- getActiveJobIds() - Method in class org.apache.spark.SparkStatusTracker
-
Returns an array containing the ids of all active jobs.
- getActiveOrCreate(Function0<StreamingContext>) - Static method in class org.apache.spark.streaming.StreamingContext
-
:: Experimental ::
- getActiveOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
-
:: Experimental ::
- getActiveStageIds() - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns an array containing the ids of all active stages.
- getActiveStageIds() - Method in class org.apache.spark.SparkStatusTracker
-
Returns an array containing the ids of all active stages.
- getAkkaConf() - Method in class org.apache.spark.SparkConf
-
Get all akka conf variables set on this SparkConf
- getAlgo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getAll() - Method in class org.apache.spark.SparkConf
-
Get all parameters as a list of pairs
- getAllConfs() - Method in class org.apache.spark.sql.SQLContext
-
Return all the configuration properties that have been set (i.e.
- getAllPools() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return pools for fair scheduler
- getAlpha() - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for getDocConcentration
- getAnyValAs(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value of a given fieldName.
- getAppId() - Method in interface org.apache.spark.launcher.SparkAppHandle
-
Returns the application ID, or null
if not yet known.
- getAppId() - Method in class org.apache.spark.SparkConf
-
Returns the Spark application id, valid in the Driver after TaskScheduler registration and
from the start in the Executor.
- getAs(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i.
- getAs(String) - Method in interface org.apache.spark.sql.Row
-
Returns the value of a given fieldName.
- getAsymmetricAlpha() - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for getAsymmetricDocConcentration
- getAsymmetricDocConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- getAttr(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its name.
- getAttr(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
-
Gets an attribute by its index.
- getAvroSchema() - Method in class org.apache.spark.SparkConf
-
Gets all the avro schemas in the configuration used in the generic Avro record serializer
- getBeta() - Method in class org.apache.spark.mllib.clustering.LDA
-
Alias for getTopicConcentration
- getBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
-
Return the given block stored in this block manager in O(1) time.
- getBoolean(String, boolean) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a boolean, falling back to a default if not set
- getBoolean(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive boolean.
- getBoolean(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Boolean.
- getBooleanArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Boolean array.
- getByte(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive byte.
- getCachedBlockManagerId(BlockManagerId) - Static method in class org.apache.spark.storage.BlockManagerId
-
- getCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
-
The three methods below are helpers for accessing the local map, a property of the SparkEnv of
the local process.
- getCaseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- getCatalystType(int, String, int, MetadataBuilder) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Get the custom datatype mapping for the given jdbc meta information.
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
-
- getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- getCategoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getCheckpointDir() - Method in class org.apache.spark.api.java.JavaSparkContext
-
- getCheckpointDir() - Method in class org.apache.spark.SparkContext
-
- getCheckpointFile() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Gets the name of the file to which this RDD was checkpointed
- getCheckpointFile() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- getCheckpointFile() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- getCheckpointFile() - Method in class org.apache.spark.rdd.RDD
-
Gets the name of the directory to which this RDD was checkpointed.
- getCheckpointFiles() - Method in class org.apache.spark.graphx.Graph
-
Gets the name of the files to which this Graph was checkpointed.
- getCheckpointFiles() - Method in class org.apache.spark.graphx.impl.GraphImpl
-
- getCheckpointInterval() - Method in class org.apache.spark.mllib.clustering.LDA
-
Period (in iterations) between checkpoints.
- getCheckpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getConf() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Return a copy of this JavaSparkContext's configuration.
- getConf() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getConf() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getConf() - Method in class org.apache.spark.SparkContext
-
Return a copy of this SparkContext's configuration.
- getConf(String) - Method in class org.apache.spark.sql.SQLContext
-
Return the value of Spark SQL configuration property for the given key.
- getConf(String, String) - Method in class org.apache.spark.sql.SQLContext
-
Return the value of Spark SQL configuration property for the given key.
- getConnection() - Method in interface org.apache.spark.rdd.JdbcRDD.ConnectionFactory
-
- getConvergenceTol() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the largest change in log-likelihood at which convergence is
considered to have occurred.
- getDate(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of date type as java.sql.Date.
- getDecimal(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of decimal type as java.math.BigDecimal.
- getDefault(Param<T>) - Method in interface org.apache.spark.ml.param.Params
-
Gets the default value of a parameter.
- getDegree() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
-
- getDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.RDD
-
Implemented by subclasses to return how this RDD depends on parent RDDs.
- getDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getDependencies() - Method in class org.apache.spark.rdd.UnionRDD
-
- getDeprecatedConfig(String, SparkConf) - Static method in class org.apache.spark.SparkConf
-
Looks for available deprecated keys for the given config option, and return the first
value available.
- getDocConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
- getDouble(String, double) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a double, falling back to a default if not set
- getDouble(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive double.
- getDouble(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Double.
- getDoubleArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Double array.
- getEpsilon() - Method in class org.apache.spark.mllib.clustering.KMeans
-
The distance threshold within which we've consider centers to have converged.
- getExecutorEnv() - Method in class org.apache.spark.SparkConf
-
Get all executor environment variables set on this SparkConf
- getExecutorMemoryStatus() - Method in class org.apache.spark.SparkContext
-
Return a map from the slave to the max memory available for caching and the remaining
memory available for caching.
- getExecutorStorageStatus() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about blocks stored in all of the slaves
- getField(String) - Method in class org.apache.spark.sql.Column
-
An expression that gets a field by name in a StructType
.
- getFinalValue() - Method in class org.apache.spark.partial.PartialResult
-
Blocking method to wait for and return the final value.
- getFloat(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive float.
- getFormula() - Method in class org.apache.spark.ml.feature.RFormula
-
- getGaps() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getImpurity() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getIndices() - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- getInitializationMode() - Method in class org.apache.spark.mllib.clustering.KMeans
-
The initialization algorithm.
- getInitializationSteps() - Method in class org.apache.spark.mllib.clustering.KMeans
-
Number of steps for the k-means|| initialization mode
- getInitialModel() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the user supplied initial GMM, if supplied
- getInitialPositionInStream(int) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
-
- getInputFormat(JobConf) - Method in class org.apache.spark.rdd.HadoopRDD
-
- getInt(String, int) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an integer, falling back to a default if not set
- getInt(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive int.
- getInverse() - Method in class org.apache.spark.ml.feature.DCT
-
- getItem(Object) - Method in class org.apache.spark.sql.Column
-
An expression that gets an item at position ordinal
out of an array,
or gets a value by key key
in a MapType
.
- getJavaMap(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of array type as a Map
.
- getJDBCType(DataType) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
-
- getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.DB2Dialect
-
- getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
-
- getJDBCType(DataType) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Retrieve the jdbc / sql type for a given datatype.
- getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
-
- getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- getJobConf() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getJobIdsForGroup(String) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Return a list of all known jobs in a particular job group.
- getJobIdsForGroup(String) - Method in class org.apache.spark.SparkStatusTracker
-
Return a list of all known jobs in a particular job group.
- getJobInfo(int) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns job information, or null
if the job info could not be found or was garbage collected.
- getJobInfo(int) - Method in class org.apache.spark.SparkStatusTracker
-
Returns job information, or None
if the job info could not be found or was garbage collected.
- getK() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
-
Gets the desired number of leaf clusters.
- getK() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the number of Gaussians in the mixture model
- getK() - Method in class org.apache.spark.mllib.clustering.KMeans
-
Number of clusters to create (k).
- getK() - Method in class org.apache.spark.mllib.clustering.LDA
-
Number of topics to infer.
- getKappa() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Learning rate: exponential decay rate
- getLabels() - Method in class org.apache.spark.ml.feature.IndexToString
-
- getLambda() - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
- getLDAModel(double[]) - Method in interface org.apache.spark.mllib.clustering.LDAOptimizer
-
- getLearningRate() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getLeastGroupHash(String) - Method in class org.apache.spark.rdd.PartitionCoalescer
-
Sorts and gets the least element of the list associated with key in groupHash
The returned PartitionGroup is the least loaded of all groups that represent the machine "key"
- getList(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of array type as List
.
- getLocalProperty(String) - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLocalProperty(String) - Method in class org.apache.spark.SparkContext
-
Get a local property set in this thread, or null if it is missing.
- getLong(String, long) - Method in class org.apache.spark.SparkConf
-
Get a parameter as a long, falling back to a default if not set
- getLong(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive long.
- getLong(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Long.
- getLongArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Long array.
- getLoss() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getLossType() - Method in class org.apache.spark.ml.classification.GBTClassifier
-
- getLossType() - Method in class org.apache.spark.ml.regression.GBTRegressor
-
- getMap(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of map type as a Scala Map.
- getMap() - Method in class org.apache.spark.sql.types.MetadataBuilder
-
Returns the immutable version of this map.
- getMaxBins() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMaxDepth() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
-
Gets the max number of k-means iterations to split clusters.
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the maximum number of iterations to run
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.KMeans
-
Maximum number of iterations to run.
- getMaxIterations() - Method in class org.apache.spark.mllib.clustering.LDA
-
Maximum number of iterations for learning.
- getMaxLocalProjDBSize() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Gets the maximum number of items allowed in a projected database before local processing.
- getMaxMemoryInMB() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMaxPatternLength() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Gets the maximal pattern length (i.e.
- getMessage() - Method in exception org.apache.spark.sql.AnalysisException
-
- getMetadata(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Metadata.
- getMetadataArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a Metadata array.
- getMetricName() - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
-
- getMetricName() - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- getMetricName() - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
-
- getMetricsSources(String) - Method in class org.apache.spark.TaskContext
-
::DeveloperApi::
Returns all metrics sources with the given name which are associated with the instance
which runs the task.
- getMinDivisibleClusterSize() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
-
Gets the minimum number of points (if >= 1.0
) or the minimum proportion of points
(if < 1.0
) of a divisible cluster.
- getMiniBatchFraction() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Mini-batch fraction, which sets the fraction of document sampled and used in each iteration
- getMinInfoGain() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMinInstancesPerNode() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getMinSupport() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
-
Get the minimal support (i.e.
- getMinTokenLength() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getModel() - Method in class org.apache.spark.ml.clustering.DistributedLDAModel
-
- getModel() - Method in class org.apache.spark.ml.clustering.LDAModel
-
Returns underlying spark.mllib model, which may be local or distributed
- getModel() - Method in class org.apache.spark.ml.clustering.LocalLDAModel
-
- getModelType() - Method in class org.apache.spark.mllib.classification.NaiveBayes
-
- getN() - Method in class org.apache.spark.ml.feature.NGram
-
- getNames() - Method in class org.apache.spark.ml.feature.VectorSlicer
-
- getNode(int, Node) - Static method in class org.apache.spark.mllib.tree.model.Node
-
Traces down from a root node to get the node with the given node index.
- getNumClasses() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getNumFeatures() - Method in class org.apache.spark.ml.feature.HashingTF
-
- getNumFeatures() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
-
The dimension of training features.
- getNumIterations() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getNumPartitions() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return the number of partitions in this RDD.
- getNumPartitions() - Method in class org.apache.spark.rdd.RDD
-
Returns the number of partitions of this RDD.
- getNumValues() - Method in class org.apache.spark.ml.attribute.NominalAttribute
-
Get the number of values, either from numValues
or from values
.
- getOldDataset(DataFrame, String) - Static method in class org.apache.spark.ml.clustering.LDA
-
Get dataset for spark.mllib LDA
- getOptimizeDocConcentration() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for
document-topic distribution) will be optimized during training.
- getOptimizer() - Method in class org.apache.spark.mllib.clustering.LDA
-
:: DeveloperApi ::
- getOption(String) - Method in class org.apache.spark.SparkConf
-
Get a parameter as an Option
- getOption() - Method in class org.apache.spark.streaming.State
-
Get the state as an Option
.
- getOrCreate(SparkConf) - Static method in class org.apache.spark.SparkContext
-
This function may be used to get or instantiate a SparkContext and register it as a
singleton object.
- getOrCreate() - Static method in class org.apache.spark.SparkContext
-
This function may be used to get or instantiate a SparkContext and register it as a
singleton object.
- getOrCreate(SparkContext) - Static method in class org.apache.spark.sql.SQLContext
-
Get the singleton SQLContext if it exists or create a new one using the given SparkContext.
- getOrCreate(String, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.4.0, replaced by getOrCreate
without JavaStreamingContextFactor.
- getOrCreate(String, Configuration, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.4.0, replaced by getOrCreate
without JavaStreamingContextFactor.
- getOrCreate(String, Configuration, JavaStreamingContextFactory, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Deprecated.
As of 1.4.0, replaced by getOrCreate
without JavaStreamingContextFactor.
- getOrCreate(String, Function0<JavaStreamingContext>) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<JavaStreamingContext>, Configuration) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<JavaStreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
-
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
- getOrDefault(Param<T>) - Method in interface org.apache.spark.ml.param.Params
-
Gets the value of a param in the embedded param map or its default value.
- getOrElse(Param<T>, T) - Method in class org.apache.spark.ml.param.ParamMap
-
Returns the value associated with a param or a default value.
- getP() - Method in class org.apache.spark.ml.feature.Normalizer
-
- getParam(String) - Method in interface org.apache.spark.ml.param.Params
-
- getParents(int) - Method in class org.apache.spark.NarrowDependency
-
Get the parent partitions for a child partition.
- getParents(int) - Method in class org.apache.spark.OneToOneDependency
-
- getParents(int) - Method in class org.apache.spark.RangeDependency
-
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.CanonicalRandomVertexCut$
-
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.EdgePartition1D$
-
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
-
- getPartition(long, long, int) - Method in interface org.apache.spark.graphx.PartitionStrategy
-
Returns the partition number for a given edge.
- getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.RandomVertexCut$
-
- getPartition(Object) - Method in class org.apache.spark.HashPartitioner
-
- getPartition(Object) - Method in class org.apache.spark.Partitioner
-
- getPartition(Object) - Method in class org.apache.spark.RangePartitioner
-
- getPartitionId() - Static method in class org.apache.spark.TaskContext
-
Returns the partition id of currently active TaskContext.
- getPartitions() - Method in class org.apache.spark.api.r.BaseRRDD
-
- getPartitions() - Method in class org.apache.spark.graphx.EdgeRDD
-
- getPartitions() - Method in class org.apache.spark.graphx.VertexRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.CoGroupedRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.JdbcRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.PartitionCoalescer
-
- getPartitions() - Method in class org.apache.spark.rdd.PartitionPruningRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.RDD
-
Implemented by subclasses to return the set of partitions in this RDD.
- getPartitions() - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getPartitions() - Method in class org.apache.spark.rdd.UnionRDD
-
- getPath() - Method in class org.apache.spark.input.PortableDataStream
-
- getPattern() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getPersistentRDDs() - Method in class org.apache.spark.SparkContext
-
Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
- getPoolForName(String) - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return the pool associated with the given name, if one exists
- getPreferredLocations(Partition) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.HadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.NewHadoopRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.RDD
-
Optionally overridden by subclasses to specify placement preferences.
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.ShuffledRDD
-
- getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.UnionRDD
-
- getQuantileCalculationStrategy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getRDDStorageInfo() - Method in class org.apache.spark.SparkContext
-
:: DeveloperApi ::
Return information about what RDDs are cached, if they are in mem or on disk, how much space
they take, etc.
- getReceiver() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
-
Gets the receiver object that will be sent to the worker nodes
to receive data.
- getRootDirectory() - Static method in class org.apache.spark.SparkFiles
-
Get the root directory that contains files added through SparkContext.addFile()
.
- getRuns() - Method in class org.apache.spark.mllib.clustering.KMeans
-
:: Experimental ::
Number of runs of the algorithm to execute in parallel.
- getScalingVec() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
-
- getSchedulingMode() - Method in class org.apache.spark.SparkContext
-
Return current scheduling mode
- getSchema(Class<?>) - Method in class org.apache.spark.sql.SQLContext
-
- getSeed() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
-
Gets the random seed.
- getSeed() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
-
Return the random seed
- getSeed() - Method in class org.apache.spark.mllib.clustering.KMeans
-
The random seed for cluster initialization.
- getSeed() - Method in class org.apache.spark.mllib.clustering.LDA
-
Random seed
- getSeq(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of array type as a Scala Seq.
- getSerializer(Serializer) - Static method in class org.apache.spark.serializer.Serializer
-
- getSerializer(Option<Serializer>) - Static method in class org.apache.spark.serializer.Serializer
-
- getShort(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a primitive short.
- getSizeAsBytes(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as bytes; throws a NoSuchElementException if it's not set.
- getSizeAsBytes(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as bytes, falling back to a default if not set.
- getSizeAsBytes(String, long) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as bytes, falling back to a default if not set.
- getSizeAsGb(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Gibibytes; throws a NoSuchElementException if it's not set.
- getSizeAsGb(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Gibibytes, falling back to a default if not set.
- getSizeAsKb(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Kibibytes; throws a NoSuchElementException if it's not set.
- getSizeAsKb(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Kibibytes, falling back to a default if not set.
- getSizeAsMb(String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Mebibytes; throws a NoSuchElementException if it's not set.
- getSizeAsMb(String, String) - Method in class org.apache.spark.SparkConf
-
Get a size parameter as Mebibytes, falling back to a default if not set.
- getSparkHome() - Method in class org.apache.spark.api.java.JavaSparkContext
-
Get Spark's home location from either a value set through the constructor,
or the spark.home Java property, or the SPARK_HOME environment variable
(in that order of preference).
- getSplits() - Method in class org.apache.spark.ml.feature.Bucketizer
-
- getSQLDialect() - Method in class org.apache.spark.sql.hive.HiveContext
-
- getSQLDialect() - Method in class org.apache.spark.sql.SQLContext
-
- getStageInfo(int) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
-
Returns stage information, or null
if the stage info could not be found or was
garbage collected.
- getStageInfo(int) - Method in class org.apache.spark.SparkStatusTracker
-
Returns stage information, or None
if the stage info could not be found or was
garbage collected.
- getStages() - Method in class org.apache.spark.ml.Pipeline
-
- getState() - Method in interface org.apache.spark.launcher.SparkAppHandle
-
Returns the current application state.
- getState() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
-
:: DeveloperApi ::
- getState() - Method in class org.apache.spark.streaming.StreamingContext
-
:: DeveloperApi ::
- getStatement() - Method in class org.apache.spark.ml.feature.SQLTransformer
-
- getStopWords() - Method in class org.apache.spark.ml.feature.StopWordsRemover
-
- getStorageLevel() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getStorageLevel() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
-
- getStorageLevel() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
-
- getStorageLevel() - Method in class org.apache.spark.rdd.RDD
-
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
- getString(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i as a String object.
- getString(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a String.
- getStringArray(String) - Method in class org.apache.spark.sql.types.Metadata
-
Gets a String array.
- getStruct(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of struct type as an
Row
object.
- getSubsamplingRate() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getTableExistsQuery(String) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
-
Get the SQL query that should be used to find if the given table exists.
- getTableExistsQuery(String) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
-
- getTableExistsQuery(String) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
-
- getTau0() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
-
A (positive) learning parameter that downweights early iterations.
- getThreadLocal() - Static method in class org.apache.spark.SparkEnv
-
Returns the ThreadLocal SparkEnv.
- getThreshold() - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- getThreshold() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- getThreshold() - Method in class org.apache.spark.ml.feature.Binarizer
-
- getThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
-
Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions.
- getThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
-
Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions.
- getThresholds() - Method in class org.apache.spark.ml.classification.LogisticRegression
-
- getThresholds() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
-
- getTimeAsMs(String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as milliseconds; throws a NoSuchElementException if it's not set.
- getTimeAsMs(String, String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as milliseconds, falling back to a default if not set.
- getTimeAsSeconds(String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as seconds; throws a NoSuchElementException if it's not set.
- getTimeAsSeconds(String, String) - Method in class org.apache.spark.SparkConf
-
Get a time parameter as seconds, falling back to a default if not set.
- getTimestamp(int) - Method in interface org.apache.spark.sql.Row
-
Returns the value at position i of date type as java.sql.Timestamp.
- gettingResult() - Method in class org.apache.spark.scheduler.TaskInfo
-
- gettingResultTime() - Method in class org.apache.spark.scheduler.TaskInfo
-
The time when the task started remotely getting the result.
- getToLowercase() - Method in class org.apache.spark.ml.feature.RegexTokenizer
-
- getTopicConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
-
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
distributions over terms.
- getTreeStrategy() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getUseNodeIdCache() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
-
- getValidationTol() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
-
- getValue() - Method in class org.apache.spark.broadcast.Broadcast
-
Actually get the broadcasted value.
- getValue(int) - Method in class org.apache.spark.ml.attribute.NominalAttribute
-
Gets a value given its index.
- getValuesMap(Seq<String>) - Method in interface org.apache.spark.sql.Row
-
Returns a Map(name -> value) for the requested fieldNames
For primitive types if value is null it returns 'zero value' specific for primitive
ie.
- getVectors() - Method in class org.apache.spark.ml.feature.Word2VecModel
-
Returns a dataframe with two fields, "word" and "vector", with "word" being a String and
and the vector the DenseVector that it is mapped to.
- getVectors() - Method in class org.apache.spark.mllib.feature.Word2VecModel
-
- Gini - Class in org.apache.spark.mllib.tree.impurity
-
:: Experimental ::
Class for calculating the
Gini impurity
during binary classification.
- Gini() - Constructor for class org.apache.spark.mllib.tree.impurity.Gini
-
- globalTopicTotals() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
-
- globalTopicTotals() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
-
Aggregate distributions over topics from all term vertices.
- glom() - Method in interface org.apache.spark.api.java.JavaRDDLike
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in class org.apache.spark.rdd.RDD
-
Return an RDD created by coalescing all elements within each partition into an array.
- glom() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- glom() - Method in class org.apache.spark.streaming.dstream.DStream
-
Return a new DStream in which each RDD is generated by applying glom() to each RDD of
this DStream.
- gradient() - Method in class org.apache.spark.ml.classification.LogisticAggregator
-
- gradient() - Method in class org.apache.spark.ml.regression.AFTAggregator
-
- gradient() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
-
- Gradient - Class in org.apache.spark.mllib.optimization
-
:: DeveloperApi ::
Class used to compute the gradient for a loss function, given a single data point.
- Gradient() - Constructor for class org.apache.spark.mllib.optimization.Gradient
-
- gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.AbsoluteError
-
Method to calculate the gradients for the gradient boosting calculation for least
absolute error calculation.
- gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.LogLoss
-
Method to calculate the loss gradients for the gradient boosting calculation for binary
classification
The gradient with respect to F(x) is: - 4 y / (1 + exp(2 y F(x)))
- gradient(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
-
Method to calculate the gradients for the gradient boosting calculation.
- gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.SquaredError
-
Method to calculate the gradients for the gradient boosting calculation for least
squares error calculation.
- GradientBoostedTrees - Class in org.apache.spark.mllib.tree
-
A class that implements
Stochastic Gradient Boosting
for regression and binary classification.
- GradientBoostedTrees(BoostingStrategy) - Constructor for class org.apache.spark.mllib.tree.GradientBoostedTrees
-
- GradientBoostedTreesModel - Class in org.apache.spark.mllib.tree.model
-
Represents a gradient boosted trees model.
- GradientBoostedTreesModel(Enumeration.Value, DecisionTreeModel[], double[]) - Constructor for class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
-
- GradientDescent - Class in org.apache.spark.mllib.optimization
-
Class used to solve an optimization problem using Gradient Descent.