A B C D E F G H I J K L M N O P Q R S T U V W Y Z _ 

A

abs(Column) - Static method in class org.apache.spark.sql.functions
Computes the absolute value.
abs() - Method in class org.apache.spark.sql.types.Decimal
 
AbsoluteError - Class in org.apache.spark.mllib.tree.loss
:: DeveloperApi :: Class for absolute error loss calculation (for regression).
AbsoluteError() - Constructor for class org.apache.spark.mllib.tree.loss.AbsoluteError
 
accessTime() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
 
accId() - Method in class org.apache.spark.CleanAccum
 
Accumulable<R,T> - Class in org.apache.spark
A data type that can be accumulated, ie has an commutative and associative "add" operation, but where the result type, R, may be different from the element type being added, T.
Accumulable(R, AccumulableParam<R, T>, Option<String>) - Constructor for class org.apache.spark.Accumulable
 
Accumulable(R, AccumulableParam<R, T>) - Constructor for class org.apache.spark.Accumulable
 
accumulable(T, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulable shared variable of the given type, to which tasks can "add" values with add.
accumulable(T, String, AccumulableParam<T, R>) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulable shared variable of the given type, to which tasks can "add" values with add.
accumulable(R, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
Create an Accumulable shared variable, to which tasks can add values with +=.
accumulable(R, String, AccumulableParam<R, T>) - Method in class org.apache.spark.SparkContext
Create an Accumulable shared variable, with a name for display in the Spark UI.
accumulableCollection(R, Function1<R, Growable<T>>, ClassTag<R>) - Method in class org.apache.spark.SparkContext
Create an accumulator from a "mutable collection" type.
AccumulableInfo - Class in org.apache.spark.scheduler
:: DeveloperApi :: Information about an Accumulable modified during a task or stage.
AccumulableInfo - Class in org.apache.spark.status.api.v1
 
AccumulableParam<R,T> - Interface in org.apache.spark
Helper object defining how to accumulate values of a particular type.
accumulables() - Method in class org.apache.spark.scheduler.StageInfo
Terminal values of accumulables updated during this stage.
accumulables() - Method in class org.apache.spark.scheduler.TaskInfo
Intermediate updates to accumulables during this task.
Accumulator<T> - Class in org.apache.spark
A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged, i.e.
Accumulator(T, AccumulatorParam<T>, Option<String>) - Constructor for class org.apache.spark.Accumulator
 
Accumulator(T, AccumulatorParam<T>) - Constructor for class org.apache.spark.Accumulator
 
accumulator(int) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator integer variable, which tasks can "add" values to using the add method.
accumulator(int, String) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator integer variable, which tasks can "add" values to using the add method.
accumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator double variable, which tasks can "add" values to using the add method.
accumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator double variable, which tasks can "add" values to using the add method.
accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator variable of a given type, which tasks can "add" values to using the add method.
accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator variable of a given type, which tasks can "add" values to using the add method.
accumulator(T, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
Create an Accumulator variable of a given type, which tasks can "add" values to using the += method.
accumulator(T, String, AccumulatorParam<T>) - Method in class org.apache.spark.SparkContext
Create an Accumulator variable of a given type, with a name for display in the Spark UI.
AccumulatorParam<T> - Interface in org.apache.spark
A simpler version of AccumulableParam where the only data type you can add in is the same type as the accumulated value.
AccumulatorParam.DoubleAccumulatorParam$ - Class in org.apache.spark
 
AccumulatorParam.DoubleAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
 
AccumulatorParam.FloatAccumulatorParam$ - Class in org.apache.spark
 
AccumulatorParam.FloatAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
 
AccumulatorParam.IntAccumulatorParam$ - Class in org.apache.spark
 
AccumulatorParam.IntAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
 
AccumulatorParam.LongAccumulatorParam$ - Class in org.apache.spark
 
AccumulatorParam.LongAccumulatorParam$() - Constructor for class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
 
accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.StageData
 
accumulatorUpdates() - Method in class org.apache.spark.status.api.v1.TaskData
 
accuracy() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
Returns accuracy
acos(Column) - Static method in class org.apache.spark.sql.functions
Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.
acos(String) - Static method in class org.apache.spark.sql.functions
Computes the cosine inverse of the given column; the returned angle is in the range 0.0 through pi.
active() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
 
activeJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
activeStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
activeTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
 
ActorHelper - Interface in org.apache.spark.streaming.receiver
:: DeveloperApi :: A receiver trait to be mixed in with your Actor to gain access to the API for pushing received data into Spark Streaming for being processed.
actorStream(Props, String, StorageLevel, SupervisorStrategy) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream with any arbitrary user implemented actor receiver.
actorStream(Props, String, StorageLevel) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream with any arbitrary user implemented actor receiver.
actorStream(Props, String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream with any arbitrary user implemented actor receiver.
actorStream(Props, String, StorageLevel, SupervisorStrategy, ClassTag<T>) - Method in class org.apache.spark.streaming.StreamingContext
Create an input stream with any arbitrary user implemented actor receiver.
ActorSupervisorStrategy - Class in org.apache.spark.streaming.receiver
:: DeveloperApi :: A helper with set of defaults for supervisor strategy
ActorSupervisorStrategy() - Constructor for class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
 
actorSystem() - Method in class org.apache.spark.SparkEnv
 
add(T) - Method in class org.apache.spark.Accumulable
Add more data to this accumulator / accumulable
add(org.apache.spark.ml.feature.Instance) - Method in class org.apache.spark.ml.classification.LogisticAggregator
Add a new training instance to this LogisticAggregator, and update the loss and gradient of the objective function.
add(AFTPoint) - Method in class org.apache.spark.ml.regression.AFTAggregator
 
add(org.apache.spark.ml.feature.Instance) - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
Add a new training instance to this LeastSquaresAggregator, and update the loss and gradient of the objective function.
add(double[], MultivariateGaussian[], ExpectationSum, Vector<Object>) - Static method in class org.apache.spark.mllib.clustering.ExpectationSum
 
add(Vector) - Method in class org.apache.spark.mllib.feature.IDF.DocumentFrequencyAggregator
Adds a new document.
add(BlockMatrix) - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
Adds two block matrices together.
add(Vector) - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
Add a new sample to this summarizer, and update the statistical summary.
add(StructField) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new field.
add(String, DataType) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new nullable field with no metadata.
add(String, DataType, boolean) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new field with no metadata.
add(String, DataType, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new field and specifying metadata.
add(String, String) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new nullable field with no metadata where the dataType is specified as a String.
add(String, String, boolean) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new field with no metadata where the dataType is specified as a String.
add(String, String, boolean, Metadata) - Method in class org.apache.spark.sql.types.StructType
Creates a new StructType by adding a new field and specifying metadata where the dataType is specified as a String.
add(Vector) - Method in class org.apache.spark.util.Vector
 
add_months(Column, int) - Static method in class org.apache.spark.sql.functions
Returns the date that is numMonths after startDate.
addAccumulator(R, T) - Method in interface org.apache.spark.AccumulableParam
Add additional data to the accumulator value.
addAccumulator(T, T) - Method in interface org.apache.spark.AccumulatorParam
 
addAppArgs(String...) - Method in class org.apache.spark.launcher.SparkLauncher
Adds command line arguments for the application.
addedFiles() - Method in class org.apache.spark.SparkContext
 
addedJars() - Method in class org.apache.spark.SparkContext
 
addFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Add a file to be downloaded with this Spark job on every node.
addFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
Adds a file to be submitted with the application.
addFile(String) - Method in class org.apache.spark.SparkContext
Add a file to be downloaded with this Spark job on every node.
addFile(String, boolean) - Method in class org.apache.spark.SparkContext
Add a file to be downloaded with this Spark job on every node.
addGrid(Param<T>, Iterable<T>) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a param with multiple values (overwrites if the input param exists).
addGrid(DoubleParam, double[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a double param with multiple values.
addGrid(IntParam, int[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a int param with multiple values.
addGrid(FloatParam, float[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a float param with multiple values.
addGrid(LongParam, long[]) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a long param with multiple values.
addGrid(BooleanParam) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Adds a boolean param with true and false.
addInPlace(R, R) - Method in interface org.apache.spark.AccumulableParam
Merge two accumulated values together.
addInPlace(double, double) - Method in class org.apache.spark.AccumulatorParam.DoubleAccumulatorParam$
 
addInPlace(float, float) - Method in class org.apache.spark.AccumulatorParam.FloatAccumulatorParam$
 
addInPlace(int, int) - Method in class org.apache.spark.AccumulatorParam.IntAccumulatorParam$
 
addInPlace(long, long) - Method in class org.apache.spark.AccumulatorParam.LongAccumulatorParam$
 
addInPlace(double, double) - Method in class org.apache.spark.SparkContext.DoubleAccumulatorParam$
 
addInPlace(float, float) - Method in class org.apache.spark.SparkContext.FloatAccumulatorParam$
 
addInPlace(int, int) - Method in class org.apache.spark.SparkContext.IntAccumulatorParam$
 
addInPlace(long, long) - Method in class org.apache.spark.SparkContext.LongAccumulatorParam$
 
addInPlace(Vector) - Method in class org.apache.spark.util.Vector
 
addInPlace(Vector, Vector) - Method in class org.apache.spark.util.Vector.VectorAccumParam$
 
addIntercept() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
Whether to add intercept (default: false).
addJar(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
addJar(String) - Method in class org.apache.spark.launcher.SparkLauncher
Adds a jar file to be submitted with the application.
addJar(String) - Method in class org.apache.spark.SparkContext
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
addJar(String) - Method in class org.apache.spark.sql.hive.HiveContext
 
addJar(String) - Method in class org.apache.spark.sql.SQLContext
Add a jar to SQLContext
addListener(SparkAppHandle.Listener) - Method in interface org.apache.spark.launcher.SparkAppHandle
Adds a listener to be notified of changes to the handle's information.
addLocalConfiguration(String, int, int, int, JobConf) - Static method in class org.apache.spark.rdd.HadoopRDD
Add Hadoop configuration specific to a single partition and attempt.
addOnCompleteCallback(Function0<BoxedUnit>) - Method in class org.apache.spark.TaskContext
Adds a callback function to be executed on task completion.
addPartToPGroup(Partition, PartitionGroup) - Method in class org.apache.spark.rdd.PartitionCoalescer
 
addPyFile(String) - Method in class org.apache.spark.launcher.SparkLauncher
Adds a python file / zip / egg to be submitted with the application.
address() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
 
addSparkArg(String) - Method in class org.apache.spark.launcher.SparkLauncher
Adds a no-value argument to the Spark invocation.
addSparkArg(String, String) - Method in class org.apache.spark.launcher.SparkLauncher
Adds an argument with a value to the Spark invocation.
addSparkListener(SparkListener) - Method in class org.apache.spark.SparkContext
:: DeveloperApi :: Register a listener to receive up-calls from events that happen during execution.
addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Add a StreamingListener object for receiving system events related to streaming.
addStreamingListener(StreamingListener) - Method in class org.apache.spark.streaming.StreamingContext
Add a StreamingListener object for receiving system events related to streaming.
addTaskCompletionListener(TaskCompletionListener) - Method in class org.apache.spark.TaskContext
Adds a (Java friendly) listener to be executed on task completion.
addTaskCompletionListener(Function1<TaskContext, BoxedUnit>) - Method in class org.apache.spark.TaskContext
Adds a listener in the form of a Scala closure to be executed on task completion.
AFTAggregator - Class in org.apache.spark.ml.regression
 
AFTAggregator(DenseVector<Object>, boolean) - Constructor for class org.apache.spark.ml.regression.AFTAggregator
 
AFTCostFun - Class in org.apache.spark.ml.regression
 
AFTCostFun(RDD<AFTPoint>, boolean) - Constructor for class org.apache.spark.ml.regression.AFTCostFun
 
AFTSurvivalRegression - Class in org.apache.spark.ml.regression
:: Experimental :: Fit a parametric survival regression model named accelerated failure time (AFT) model (https://en.wikipedia.org/wiki/Accelerated_failure_time_model) based on the Weibull distribution of the survival time.
AFTSurvivalRegression(String) - Constructor for class org.apache.spark.ml.regression.AFTSurvivalRegression
 
AFTSurvivalRegression() - Constructor for class org.apache.spark.ml.regression.AFTSurvivalRegression
 
AFTSurvivalRegressionModel - Class in org.apache.spark.ml.regression
:: Experimental :: Model produced by AFTSurvivalRegression.
agg(Column, Column...) - Method in class org.apache.spark.sql.DataFrame
Aggregates on the entire DataFrame without groups.
agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Aggregates on the entire DataFrame without groups.
agg(Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Aggregates on the entire DataFrame without groups.
agg(Map<String, String>) - Method in class org.apache.spark.sql.DataFrame
(Java-specific) Aggregates on the entire DataFrame without groups.
agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
Aggregates on the entire DataFrame without groups.
agg(Column, Column...) - Method in class org.apache.spark.sql.GroupedData
Compute aggregates by specifying a series of aggregate columns.
agg(Tuple2<String, String>, Seq<Tuple2<String, String>>) - Method in class org.apache.spark.sql.GroupedData
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
agg(Map<String, String>) - Method in class org.apache.spark.sql.GroupedData
(Scala-specific) Compute aggregates by specifying a map from column name to aggregate methods.
agg(Map<String, String>) - Method in class org.apache.spark.sql.GroupedData
(Java-specific) Compute aggregates by specifying a map from column name to aggregate methods.
agg(Column, Seq<Column>) - Method in class org.apache.spark.sql.GroupedData
Compute aggregates by specifying a series of aggregate columns.
agg(TypedColumn<V, U1>) - Method in class org.apache.spark.sql.GroupedDataset
Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
agg(TypedColumn<V, U1>, TypedColumn<V, U2>) - Method in class org.apache.spark.sql.GroupedDataset
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
agg(TypedColumn<V, U1>, TypedColumn<V, U2>, TypedColumn<V, U3>) - Method in class org.apache.spark.sql.GroupedDataset
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
agg(TypedColumn<V, U1>, TypedColumn<V, U2>, TypedColumn<V, U3>, TypedColumn<V, U4>) - Method in class org.apache.spark.sql.GroupedDataset
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
aggregate(U, Function2<U, T, U>, Function2<U, U, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value".
aggregate(U, Function2<U, T, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value".
aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
Aggregate the values of each key, using given combine functions and a neutral "zero value".
aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
Aggregate the values of each key, using given combine functions and a neutral "zero value".
aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>) - Method in class org.apache.spark.api.java.JavaPairRDD
Aggregate the values of each key, using given combine functions and a neutral "zero value".
aggregateByKey(U, Partitioner, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Aggregate the values of each key, using given combine functions and a neutral "zero value".
aggregateByKey(U, int, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Aggregate the values of each key, using given combine functions and a neutral "zero value".
aggregateByKey(U, Function2<U, V, U>, Function2<U, U, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Aggregate the values of each key, using given combine functions and a neutral "zero value".
AggregatedDialect - Class in org.apache.spark.sql.jdbc
AggregatedDialect can unify multiple dialects into one virtual Dialect.
AggregatedDialect(List<JdbcDialect>) - Constructor for class org.apache.spark.sql.jdbc.AggregatedDialect
 
aggregateMessages(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, ClassTag<A>) - Method in class org.apache.spark.graphx.Graph
Aggregates values from the neighboring edges and vertices of each vertex.
aggregateMessagesWithActiveSet(Function1<EdgeContext<VD, ED, A>, BoxedUnit>, Function2<A, A, A>, TripletFields, Option<Tuple2<VertexRDD<?>, EdgeDirection>>, ClassTag<A>) - Method in class org.apache.spark.graphx.impl.GraphImpl
 
aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
aggregateUsingIndex(RDD<Tuple2<Object, VD2>>, Function2<VD2, VD2, VD2>, ClassTag<VD2>) - Method in class org.apache.spark.graphx.VertexRDD
Aggregates vertices in messages that have the same ids using reduceFunc, returning a VertexRDD co-indexed with this.
AggregatingEdgeContext<VD,ED,A> - Class in org.apache.spark.graphx.impl
 
AggregatingEdgeContext(Function2<A, A, A>, Object, BitSet) - Constructor for class org.apache.spark.graphx.impl.AggregatingEdgeContext
 
Aggregator<K,V,C> - Class in org.apache.spark
:: DeveloperApi :: A set of functions used to aggregate data.
Aggregator(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Constructor for class org.apache.spark.Aggregator
 
aggregator() - Method in class org.apache.spark.ShuffleDependency
 
Aggregator<I,B,O> - Class in org.apache.spark.sql.expressions
A base class for user-defined aggregations, which can be used in DataFrame and Dataset operations to take all of the elements of a group and reduce them to a single value.
Aggregator() - Constructor for class org.apache.spark.sql.expressions.Aggregator
 
aggUntyped(Seq<TypedColumn<?, ?>>) - Method in class org.apache.spark.sql.GroupedDataset
Internal helper function for building typed aggregations that return tuples.
Algo - Class in org.apache.spark.mllib.tree.configuration
:: Experimental :: Enum to select the algorithm for the decision tree
Algo() - Constructor for class org.apache.spark.mllib.tree.configuration.Algo
 
algo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
algo() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
 
algo() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
 
algo() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
 
algorithm() - Method in class org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD
 
algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
The algorithm to use for updating.
algorithm() - Method in class org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
 
alias(String) - Method in class org.apache.spark.sql.Column
Gives the column an alias.
alias(String) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame with an alias set.
alias(Symbol) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Returns a new DataFrame with an alias set.
All - Static variable in class org.apache.spark.graphx.TripletFields
Expose all the fields (source, edge, and destination).
alpha() - Method in class org.apache.spark.mllib.random.WeibullGenerator
 
AlphaComponent - Annotation Type in org.apache.spark.annotation
A new component of Spark which may have unstable API's.
ALS - Class in org.apache.spark.ml.recommendation
:: Experimental :: Alternating Least Squares (ALS) matrix factorization.
ALS(String) - Constructor for class org.apache.spark.ml.recommendation.ALS
 
ALS() - Constructor for class org.apache.spark.ml.recommendation.ALS
 
ALS - Class in org.apache.spark.mllib.recommendation
 
ALS() - Constructor for class org.apache.spark.mllib.recommendation.ALS
 
ALS.Rating<ID> - Class in org.apache.spark.ml.recommendation
:: DeveloperApi :: Rating class for better code readability.
ALS.Rating(ID, ID, float) - Constructor for class org.apache.spark.ml.recommendation.ALS.Rating
 
ALS.Rating$ - Class in org.apache.spark.ml.recommendation
 
ALS.Rating$() - Constructor for class org.apache.spark.ml.recommendation.ALS.Rating$
 
ALSModel - Class in org.apache.spark.ml.recommendation
:: Experimental :: Model fitted by ALS.
AnalysisException - Exception in org.apache.spark.sql
:: DeveloperApi :: Thrown when a query fails to analyze, usually because the query itself is invalid.
AnalysisException(String, Option<Object>, Option<Object>) - Constructor for exception org.apache.spark.sql.AnalysisException
 
analyze(String) - Method in class org.apache.spark.sql.hive.HiveContext
Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
analyzer() - Method in class org.apache.spark.sql.hive.HiveContext
 
analyzer() - Method in class org.apache.spark.sql.SQLContext
 
and(Column) - Method in class org.apache.spark.sql.Column
Boolean AND.
And - Class in org.apache.spark.sql.sources
A filter that evaluates to true iff both left or right evaluate to true.
And(Filter, Filter) - Constructor for class org.apache.spark.sql.sources.And
 
antecedent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
 
ANY() - Static method in class org.apache.spark.scheduler.TaskLocality
 
anyNull() - Method in interface org.apache.spark.sql.Row
Returns true if there are any NULL values in this row.
appAttemptId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
 
appendBias(Vector) - Static method in class org.apache.spark.mllib.util.MLUtils
Returns a new vector with 1.0 (bias) appended to the input vector.
appId() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
 
applicationAttemptId() - Method in class org.apache.spark.SparkContext
 
ApplicationAttemptInfo - Class in org.apache.spark.status.api.v1
 
applicationId() - Method in class org.apache.spark.SparkContext
A unique identifier for the Spark application.
ApplicationInfo - Class in org.apache.spark.status.api.v1
 
ApplicationStatus - Enum in org.apache.spark.status.api.v1
 
apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
Construct a graph from a collection of vertices and edges with attributes.
apply(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from edges, setting referenced vertices to `defaultVertexAttr`.
apply(RDD<Tuple2<Object, VD>>, RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from vertices and edges, setting missing vertices to `defaultVertexAttr`.
apply(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from a VertexRDD and an EdgeRDD with arbitrary replicated vertices.
apply(Graph<VD, ED>, A, int, EdgeDirection, Function3<Object, VD, A, VD>, Function1<EdgeTriplet<VD, ED>, Iterator<Tuple2<Object, A>>>, Function2<A, A, A>, ClassTag<VD>, ClassTag<ED>, ClassTag<A>) - Static method in class org.apache.spark.graphx.Pregel
Execute a Pregel-like iterative vertex-parallel abstraction.
apply(RDD<Tuple2<Object, VD>>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a standalone VertexRDD (one that is not set up for efficient joins with an EdgeRDD) from an RDD of vertex-attribute pairs.
apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a VertexRDD from an RDD of vertex-attribute pairs.
apply(RDD<Tuple2<Object, VD>>, EdgeRDD<?>, VD, Function2<VD, VD, VD>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a VertexRDD from an RDD of vertex-attribute pairs.
apply(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
Gets an attribute by its name.
apply(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
Gets an attribute by its index.
apply(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
Gets the value of the input param or its default value if it does not exist.
apply(int, int) - Method in class org.apache.spark.mllib.linalg.DenseMatrix
 
apply(int) - Method in class org.apache.spark.mllib.linalg.DenseVector
 
apply(int, int) - Method in interface org.apache.spark.mllib.linalg.Matrix
Gets the (i, j)-th element.
apply(int, int) - Method in class org.apache.spark.mllib.linalg.SparseMatrix
 
apply(int) - Method in interface org.apache.spark.mllib.linalg.Vector
Gets the value of the ith element.
apply(int, Predict, double, boolean) - Static method in class org.apache.spark.mllib.tree.model.Node
Construct a node with nodeIndex, predict, impurity and isLeaf parameters.
apply(String) - Static method in class org.apache.spark.rdd.PartitionGroup
 
apply(long, String, Option<String>, String, boolean) - Static method in class org.apache.spark.scheduler.AccumulableInfo
 
apply(long, String, Option<String>, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
 
apply(long, String, String) - Static method in class org.apache.spark.scheduler.AccumulableInfo
 
apply(long, TaskMetrics) - Static method in class org.apache.spark.scheduler.RuntimePercentage
 
apply(Object) - Method in class org.apache.spark.sql.Column
Extracts a value or values from a complex type.
apply(String) - Method in class org.apache.spark.sql.DataFrame
Selects column based on the column name and return it as a Column.
apply(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Creates a Column for this UDAF using given Columns as input arguments.
apply(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Creates a Column for this UDAF using given Columns as input arguments.
apply(DataFrame, Seq<Expression>, GroupedData.GroupType) - Static method in class org.apache.spark.sql.GroupedData
 
apply(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i.
apply(DataType) - Static method in class org.apache.spark.sql.types.ArrayType
Construct a ArrayType object with the given element type.
apply(double) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(long) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(int) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(BigDecimal) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(BigDecimal, int, int) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(long, int, int) - Static method in class org.apache.spark.sql.types.Decimal
 
apply(String) - Static method in class org.apache.spark.sql.types.Decimal
 
apply() - Static method in class org.apache.spark.sql.types.DecimalType
 
apply(Option<PrecisionInfo>) - Static method in class org.apache.spark.sql.types.DecimalType
 
apply(DataType, DataType) - Static method in class org.apache.spark.sql.types.MapType
Construct a MapType object with the given key type and value type.
apply(String) - Method in class org.apache.spark.sql.types.StructType
Extracts a StructField of the given name.
apply(Set<String>) - Method in class org.apache.spark.sql.types.StructType
Returns a StructType containing StructFields of the given names, preserving the original order of fields.
apply(int) - Method in class org.apache.spark.sql.types.StructType
 
apply(Seq<Column>) - Method in class org.apache.spark.sql.UserDefinedFunction
 
apply(String) - Static method in class org.apache.spark.storage.BlockId
Converts a BlockId "name" String back into a BlockId.
apply(String, String, int) - Static method in class org.apache.spark.storage.BlockManagerId
Returns a BlockManagerId for the given configuration.
apply(ObjectInput) - Static method in class org.apache.spark.storage.BlockManagerId
 
apply(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi :: Create a new StorageLevel object without setting useOffHeap.
apply(boolean, boolean, boolean, int) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi :: Create a new StorageLevel object.
apply(int, int) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi :: Create a new StorageLevel object from its integer representation.
apply(ObjectInput) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi :: Read StorageLevel object from ObjectInput stream.
apply(String, int) - Static method in class org.apache.spark.streaming.kafka.Broker
 
apply(String, int, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
 
apply(TopicAndPartition, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
 
apply(long) - Static method in class org.apache.spark.streaming.Milliseconds
 
apply(long) - Static method in class org.apache.spark.streaming.Minutes
 
apply(long) - Static method in class org.apache.spark.streaming.Seconds
 
apply(TraversableOnce<Object>) - Static method in class org.apache.spark.util.StatCounter
Build a StatCounter from a list of values.
apply(Seq<Object>) - Static method in class org.apache.spark.util.StatCounter
Build a StatCounter from a list of values passed as variable-length arguments.
apply(int) - Method in class org.apache.spark.util.Vector
 
applySchema(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
 
applySchema(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
 
applySchema(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
 
applySchema(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
 
applySchemaToPythonRDD(RDD<Object[]>, String) - Method in class org.apache.spark.sql.SQLContext
 
applySchemaToPythonRDD(RDD<Object[]>, StructType) - Method in class org.apache.spark.sql.SQLContext
 
appName() - Method in class org.apache.spark.api.java.JavaSparkContext
 
appName() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
 
appName() - Method in class org.apache.spark.SparkContext
 
approxCountDistinct(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
approxCountDistinct(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
approxCountDistinct(Column, double) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
approxCountDistinct(String, double) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the approximate number of distinct items in a group.
ApproxHist() - Static method in class org.apache.spark.mllib.tree.configuration.QuantileStrategy
 
areaUnderPR() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Computes the area under the precision-recall curve.
areaUnderROC() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
Computes the area under the receiver operating characteristic (ROC) curve.
areaUnderROC() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Computes the area under the receiver operating characteristic (ROC) curve.
argmax() - Method in class org.apache.spark.mllib.linalg.DenseVector
 
argmax() - Method in class org.apache.spark.mllib.linalg.SparseVector
 
argmax() - Method in interface org.apache.spark.mllib.linalg.Vector
Find the index of a maximal element.
arr() - Method in class org.apache.spark.rdd.PartitionGroup
 
array(DataType) - Method in class org.apache.spark.sql.ColumnName
Creates a new StructField of type array.
array(Column...) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
array(String, String...) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
array(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
array(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
Creates a new array column.
array_contains(Column, Object) - Static method in class org.apache.spark.sql.functions
Returns true if the array contain the value
arrayLengthGt(double) - Static method in class org.apache.spark.ml.param.ParamValidators
Check that the array length is greater than lowerBound.
ArrayType - Class in org.apache.spark.sql.types
 
ArrayType(DataType, boolean) - Constructor for class org.apache.spark.sql.types.ArrayType
 
ArrayType() - Constructor for class org.apache.spark.sql.types.ArrayType
No-arg constructor for kryo.
as(Encoder<U>) - Method in class org.apache.spark.sql.Column
Provides a type hint about the expected return value of this column.
as(String) - Method in class org.apache.spark.sql.Column
Gives the column an alias.
as(Seq<String>) - Method in class org.apache.spark.sql.Column
(Scala-specific) Assigns the given aliases to the results of a table generating function.
as(String[]) - Method in class org.apache.spark.sql.Column
Assigns the given aliases to the results of a table generating function.
as(Symbol) - Method in class org.apache.spark.sql.Column
Gives the column an alias.
as(String, Metadata) - Method in class org.apache.spark.sql.Column
Gives the column an alias with metadata.
as(Encoder<U>) - Method in class org.apache.spark.sql.DataFrame
:: Experimental :: Converts this DataFrame to a strongly-typed Dataset containing objects of the specified type, U.
as(String) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame with an alias set.
as(Symbol) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Returns a new DataFrame with an alias set.
as(Encoder<U>) - Method in class org.apache.spark.sql.Dataset
Returns a new Dataset where each record has been mapped on to the specified type.
as(String) - Method in class org.apache.spark.sql.Dataset
Applies a logical alias to this Dataset that can be used to disambiguate columns that have the same name after two Datasets have been joined.
asc() - Method in class org.apache.spark.sql.Column
Returns an ordering used in sorting.
asc(String) - Static method in class org.apache.spark.sql.functions
Returns a sort expression based on ascending order of the column.
ascii(Column) - Static method in class org.apache.spark.sql.functions
Computes the numeric value of the first character of the string column, and returns the result as a int column.
asin(Column) - Static method in class org.apache.spark.sql.functions
Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.
asin(String) - Static method in class org.apache.spark.sql.functions
Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.
asIntegral() - Method in class org.apache.spark.sql.types.DecimalType
 
asIntegral() - Method in class org.apache.spark.sql.types.DoubleType
 
asIntegral() - Method in class org.apache.spark.sql.types.FloatType
 
asIterator() - Method in class org.apache.spark.serializer.DeserializationStream
Read the elements of this stream through an iterator.
asJavaPairRDD() - Method in class org.apache.spark.api.r.PairwiseRRDD
 
asJavaRDD() - Method in class org.apache.spark.api.r.RRDD
 
asJavaRDD() - Method in class org.apache.spark.api.r.StringRRDD
 
asKeyValueIterator() - Method in class org.apache.spark.serializer.DeserializationStream
Read the elements of this stream through an iterator over key-value pairs.
AskPermissionToCommitOutput - Class in org.apache.spark.scheduler
 
AskPermissionToCommitOutput(int, int, int) - Constructor for class org.apache.spark.scheduler.AskPermissionToCommitOutput
 
askTimeout(SparkConf) - Static method in class org.apache.spark.util.RpcUtils
 
asRDDId() - Method in class org.apache.spark.storage.BlockId
 
assertValid() - Method in class org.apache.spark.broadcast.Broadcast
Check if this broadcast is valid.
assignments() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
 
AssociationRules - Class in org.apache.spark.mllib.fpm
:: Experimental ::
AssociationRules() - Constructor for class org.apache.spark.mllib.fpm.AssociationRules
Constructs a default instance with default parameters {minConfidence = 0.8}.
AssociationRules.Rule<Item> - Class in org.apache.spark.mllib.fpm
:: Experimental ::
AsyncRDDActions<T> - Class in org.apache.spark.rdd
A set of asynchronous RDD actions available through an implicit conversion.
AsyncRDDActions(RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.rdd.AsyncRDDActions
 
atan(Column) - Static method in class org.apache.spark.sql.functions
Computes the tangent inverse of the given value.
atan(String) - Static method in class org.apache.spark.sql.functions
Computes the tangent inverse of the given column.
atan2(Column, Column) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(Column, String) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(String, Column) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(String, String) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(Column, double) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(String, double) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(double, Column) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
atan2(double, String) - Static method in class org.apache.spark.sql.functions
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
attempt() - Method in class org.apache.spark.scheduler.TaskInfo
 
attempt() - Method in class org.apache.spark.status.api.v1.TaskData
 
attemptId() - Method in class org.apache.spark.scheduler.StageInfo
 
attemptId() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
 
attemptId() - Method in class org.apache.spark.status.api.v1.StageData
 
attemptId() - Method in class org.apache.spark.TaskContext
 
attemptNumber() - Method in class org.apache.spark.scheduler.AskPermissionToCommitOutput
 
attemptNumber() - Method in class org.apache.spark.scheduler.TaskInfo
 
attemptNumber() - Method in class org.apache.spark.TaskCommitDenied
 
attemptNumber() - Method in class org.apache.spark.TaskContext
How many times this task has been attempted.
attempts() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
 
attr() - Method in class org.apache.spark.graphx.Edge
 
attr() - Method in class org.apache.spark.graphx.EdgeContext
The attribute associated with the edge.
attr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
 
Attribute - Class in org.apache.spark.ml.attribute
:: DeveloperApi :: Abstract class for ML attributes.
Attribute() - Constructor for class org.apache.spark.ml.attribute.Attribute
 
attribute() - Method in class org.apache.spark.sql.sources.EqualNullSafe
 
attribute() - Method in class org.apache.spark.sql.sources.EqualTo
 
attribute() - Method in class org.apache.spark.sql.sources.GreaterThan
 
attribute() - Method in class org.apache.spark.sql.sources.GreaterThanOrEqual
 
attribute() - Method in class org.apache.spark.sql.sources.In
 
attribute() - Method in class org.apache.spark.sql.sources.IsNotNull
 
attribute() - Method in class org.apache.spark.sql.sources.IsNull
 
attribute() - Method in class org.apache.spark.sql.sources.LessThan
 
attribute() - Method in class org.apache.spark.sql.sources.LessThanOrEqual
 
attribute() - Method in class org.apache.spark.sql.sources.StringContains
 
attribute() - Method in class org.apache.spark.sql.sources.StringEndsWith
 
attribute() - Method in class org.apache.spark.sql.sources.StringStartsWith
 
AttributeGroup - Class in org.apache.spark.ml.attribute
:: DeveloperApi :: Attributes that describe a vector ML column.
AttributeGroup(String) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group without attribute info.
AttributeGroup(String, int) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group knowing only the number of attributes.
AttributeGroup(String, Attribute[]) - Constructor for class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group with attributes.
attributes() - Method in class org.apache.spark.ml.attribute.AttributeGroup
Optional array of attributes.
AttributeType - Class in org.apache.spark.ml.attribute
:: DeveloperApi :: An enum-like type for attribute types: AttributeType$.Numeric, AttributeType$.Nominal, and AttributeType$.Binary.
AttributeType(String) - Constructor for class org.apache.spark.ml.attribute.AttributeType
 
attrType() - Method in class org.apache.spark.ml.attribute.Attribute
Attribute type.
attrType() - Method in class org.apache.spark.ml.attribute.BinaryAttribute
 
attrType() - Method in class org.apache.spark.ml.attribute.NominalAttribute
 
attrType() - Method in class org.apache.spark.ml.attribute.NumericAttribute
 
attrType() - Static method in class org.apache.spark.ml.attribute.UnresolvedAttribute
 
available() - Method in class org.apache.spark.storage.BufferReleasingInputStream
 
avg(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the average of the values in a group.
avg(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the average of the values in a group.
avg(String...) - Method in class org.apache.spark.sql.GroupedData
Compute the mean value for each numeric columns for each group.
avg(Seq<String>) - Method in class org.apache.spark.sql.GroupedData
Compute the mean value for each numeric columns for each group.
avgMetrics() - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
 
awaitTermination() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Wait for the execution to stop.
awaitTermination(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Deprecated.
As of 1.3.0, replaced by awaitTerminationOrTimeout(Long).
awaitTermination() - Method in class org.apache.spark.streaming.StreamingContext
Wait for the execution to stop.
awaitTermination(long) - Method in class org.apache.spark.streaming.StreamingContext
Deprecated.
As of 1.3.0, replaced by awaitTerminationOrTimeout(Long).
awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Wait for the execution to stop.
awaitTerminationOrTimeout(long) - Method in class org.apache.spark.streaming.StreamingContext
Wait for the execution to stop.

B

base64(Column) - Static method in class org.apache.spark.sql.functions
Computes the BASE64 encoding of a binary column and returns it as a string column.
baseOn(ParamPair<?>...) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Sets the given parameters in this grid to fixed values.
baseOn(ParamMap) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Sets the given parameters in this grid to fixed values.
baseOn(Seq<ParamPair<?>>) - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Sets the given parameters in this grid to fixed values.
BaseRelation - Class in org.apache.spark.sql.sources
::DeveloperApi:: Represents a collection of tuples with a known schema.
BaseRelation() - Constructor for class org.apache.spark.sql.sources.BaseRelation
 
baseRelationToDataFrame(BaseRelation) - Method in class org.apache.spark.sql.SQLContext
 
BaseRRDD<T,U> - Class in org.apache.spark.api.r
 
BaseRRDD(RDD<T>, int, byte[], String, String, byte[], Broadcast<Object>[], ClassTag<T>, ClassTag<U>) - Constructor for class org.apache.spark.api.r.BaseRRDD
 
baseScope() - Method in class org.apache.spark.streaming.dstream.DStream
The base scope associated with the operation that created this DStream.
baseScope() - Method in class org.apache.spark.streaming.dstream.InputDStream
The base scope associated with the operation that created this DStream.
BATCHES() - Static method in class org.apache.spark.mllib.clustering.StreamingKMeans
 
BatchInfo - Class in org.apache.spark.streaming.scheduler
:: DeveloperApi :: Class having information on completed batches.
BatchInfo(Time, Map<Object, StreamInputInfo>, long, Option<Object>, Option<Object>, Map<Object, OutputOperationInfo>) - Constructor for class org.apache.spark.streaming.scheduler.BatchInfo
 
batchInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted
 
batchInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerBatchStarted
 
batchInfo() - Method in class org.apache.spark.streaming.scheduler.StreamingListenerBatchSubmitted
 
batchInfos() - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
 
batchTime() - Method in class org.apache.spark.streaming.scheduler.BatchInfo
 
batchTime() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
 
bean(Class<T>) - Static method in class org.apache.spark.sql.Encoders
Creates an encoder for Java Bean of type T.
beforeFetch(Connection, Map<String, String>) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Override connection specific properties to run before a select is made.
beforeFetch(Connection, Map<String, String>) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
 
Bernoulli() - Static method in class org.apache.spark.mllib.classification.NaiveBayes
 
BernoulliCellSampler<T> - Class in org.apache.spark.util.random
:: DeveloperApi :: A sampler based on Bernoulli trials for partitioning a data sequence.
BernoulliCellSampler(double, double, boolean) - Constructor for class org.apache.spark.util.random.BernoulliCellSampler
 
BernoulliSampler<T> - Class in org.apache.spark.util.random
:: DeveloperApi :: A sampler based on Bernoulli trials.
BernoulliSampler(double, ClassTag<T>) - Constructor for class org.apache.spark.util.random.BernoulliSampler
 
bestModel() - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
 
bestModel() - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
 
beta() - Method in class org.apache.spark.mllib.random.WeibullGenerator
 
between(Object, Object) - Method in class org.apache.spark.sql.Column
True if the current column is between the lower bound and upper bound, inclusive.
bin(Column) - Static method in class org.apache.spark.sql.functions
An expression that returns the string representation of the binary value of the given long column.
bin(String) - Static method in class org.apache.spark.sql.functions
An expression that returns the string representation of the binary value of the given long column.
Binarizer - Class in org.apache.spark.ml.feature
:: Experimental :: Binarize a column of continuous features given a threshold.
Binarizer(String) - Constructor for class org.apache.spark.ml.feature.Binarizer
 
Binarizer() - Constructor for class org.apache.spark.ml.feature.Binarizer
 
Binary() - Static method in class org.apache.spark.ml.attribute.AttributeType
Binary type.
binary() - Method in class org.apache.spark.sql.ColumnName
Creates a new StructField of type binary.
BINARY() - Static method in class org.apache.spark.sql.Encoders
An encoder for arrays of bytes.
BinaryAttribute - Class in org.apache.spark.ml.attribute
:: DeveloperApi :: A binary attribute.
BinaryClassificationEvaluator - Class in org.apache.spark.ml.evaluation
:: Experimental :: Evaluator for binary classification, which expects two input columns: rawPrediction and label.
BinaryClassificationEvaluator(String) - Constructor for class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
 
BinaryClassificationEvaluator() - Constructor for class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
 
BinaryClassificationMetrics - Class in org.apache.spark.mllib.evaluation
Evaluator for binary classification.
BinaryClassificationMetrics(RDD<Tuple2<Object, Object>>, int) - Constructor for class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
 
BinaryClassificationMetrics(RDD<Tuple2<Object, Object>>) - Constructor for class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Defaults numBins to 0.
binaryFiles(String, int) - Method in class org.apache.spark.api.java.JavaSparkContext
Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array.
binaryFiles(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array.
binaryFiles(String, int) - Method in class org.apache.spark.SparkContext
Get an RDD for a Hadoop-readable dataset as PortableDataStream for each file (useful for binary data)
binaryLabelValidator() - Static method in class org.apache.spark.mllib.util.DataValidators
Function to check if labels used for classification are either zero or one.
BinaryLogisticRegressionSummary - Class in org.apache.spark.ml.classification
:: Experimental :: Binary Logistic regression results for a given model.
BinaryLogisticRegressionTrainingSummary - Class in org.apache.spark.ml.classification
:: Experimental :: Logistic regression training results.
binaryRecords(String, int) - Method in class org.apache.spark.api.java.JavaSparkContext
Load data from a flat binary file, assuming the length of each record is constant.
binaryRecords(String, int, Configuration) - Method in class org.apache.spark.SparkContext
Load data from a flat binary file, assuming the length of each record is constant.
binaryRecordsStream(String, int) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files with fixed record lengths, yielding byte arrays
binaryRecordsStream(String, int) - Method in class org.apache.spark.streaming.StreamingContext
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as flat binary files, assuming a fixed length per record, generating one byte array per record.
BinarySample - Class in org.apache.spark.mllib.stat.test
Class that represents the group and value of a sample.
BinarySample(boolean, double) - Constructor for class org.apache.spark.mllib.stat.test.BinarySample
 
BinaryType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The data type representing Array[Byte] values.
BinaryType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the BinaryType object.
BisectingKMeans - Class in org.apache.spark.mllib.clustering
A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.
BisectingKMeans() - Constructor for class org.apache.spark.mllib.clustering.BisectingKMeans
Constructs with the default configuration
BisectingKMeansModel - Class in org.apache.spark.mllib.clustering
Clustering model produced by BisectingKMeans.
bitwiseAND(Object) - Method in class org.apache.spark.sql.Column
Compute bitwise AND of this expression with another expression.
bitwiseNOT(Column) - Static method in class org.apache.spark.sql.functions
Computes bitwise NOT.
bitwiseOR(Object) - Method in class org.apache.spark.sql.Column
Compute bitwise OR of this expression with another expression.
bitwiseXOR(Object) - Method in class org.apache.spark.sql.Column
Compute bitwise XOR of this expression with another expression.
BlockId - Class in org.apache.spark.storage
:: DeveloperApi :: Identifies a particular Block of data, usually associated with a single file.
BlockId() - Constructor for class org.apache.spark.storage.BlockId
 
blockId() - Method in class org.apache.spark.storage.BlockUpdatedInfo
 
blockManager() - Method in class org.apache.spark.SparkEnv
 
blockManagerId() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerAdded
 
blockManagerId() - Method in class org.apache.spark.scheduler.SparkListenerBlockManagerRemoved
 
BlockManagerId - Class in org.apache.spark.storage
:: DeveloperApi :: This class represent an unique identifier for a BlockManager.
blockManagerId() - Method in class org.apache.spark.storage.BlockUpdatedInfo
 
blockManagerId() - Method in class org.apache.spark.storage.StorageStatus
 
blockManagerIdCache() - Static method in class org.apache.spark.storage.BlockManagerId
 
blockManagerIds() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
BlockMatrix - Class in org.apache.spark.mllib.linalg.distributed
Represents a distributed matrix in blocks of local matrices.
BlockMatrix(RDD<Tuple2<Tuple2<Object, Object>, Matrix>>, int, int, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.BlockMatrix
 
BlockMatrix(RDD<Tuple2<Tuple2<Object, Object>, Matrix>>, int, int) - Constructor for class org.apache.spark.mllib.linalg.distributed.BlockMatrix
Alternate constructor for BlockMatrix without the input of the number of rows and columns.
blockName() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
 
BlockNotFoundException - Exception in org.apache.spark.storage
 
BlockNotFoundException(String) - Constructor for exception org.apache.spark.storage.BlockNotFoundException
 
blockReplication() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
 
blocks() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
 
blocks() - Method in class org.apache.spark.storage.StorageStatus
Return the blocks stored in this block manager.
blockSize() - Method in class org.apache.spark.sql.sources.HadoopFsRelation.FakeFileStatus
 
BlockStatus - Class in org.apache.spark.storage
 
BlockStatus(StorageLevel, long, long, long) - Constructor for class org.apache.spark.storage.BlockStatus
 
blockTransferService() - Method in class org.apache.spark.SparkEnv
 
blockUpdatedInfo() - Method in class org.apache.spark.scheduler.SparkListenerBlockUpdated
 
BlockUpdatedInfo - Class in org.apache.spark.storage
:: DeveloperApi :: Stores information about a block status in a block manager.
BlockUpdatedInfo(BlockManagerId, BlockId, StorageLevel, long, long, long) - Constructor for class org.apache.spark.storage.BlockUpdatedInfo
 
bmAddress() - Method in class org.apache.spark.FetchFailed
 
BOOLEAN() - Static method in class org.apache.spark.sql.Encoders
An encoder for nullable boolean type.
BooleanParam - Class in org.apache.spark.ml.param
:: DeveloperApi :: Specialized version of Param[Boolean] for Java.
BooleanParam(String, String, String) - Constructor for class org.apache.spark.ml.param.BooleanParam
 
BooleanParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.BooleanParam
 
BooleanType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The data type representing Boolean values.
BooleanType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the BooleanType object.
booleanWritableConverter() - Static method in class org.apache.spark.SparkContext
 
boolToBoolWritable(boolean) - Static method in class org.apache.spark.SparkContext
 
BoostingStrategy - Class in org.apache.spark.mllib.tree.configuration
Configuration options for GradientBoostedTrees.
BoostingStrategy(Strategy, Loss, int, double, double) - Constructor for class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
Both() - Static method in class org.apache.spark.graphx.EdgeDirection
Edges originating from *and* arriving at a vertex of interest.
boundaries() - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
Boundaries in increasing order for which predictions are known.
boundaries() - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
 
BoundedDouble - Class in org.apache.spark.partial
A Double value with error bars and associated confidence.
BoundedDouble(double, double, double, double) - Constructor for class org.apache.spark.partial.BoundedDouble
 
boundTEncoder() - Method in class org.apache.spark.sql.Dataset
The encoder where the expressions used to construct an object from an input row have been bound to the ordinals of this Dataset's output schema.
broadcast(T) - Method in class org.apache.spark.api.java.JavaSparkContext
Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions.
Broadcast<T> - Class in org.apache.spark.broadcast
A broadcast variable.
Broadcast(long, ClassTag<T>) - Constructor for class org.apache.spark.broadcast.Broadcast
 
broadcast(T, ClassTag<T>) - Method in class org.apache.spark.SparkContext
Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions.
broadcast(DataFrame) - Static method in class org.apache.spark.sql.functions
Marks a DataFrame as small enough for use in broadcast joins.
BROADCAST() - Static method in class org.apache.spark.storage.BlockId
 
BroadcastBlockId - Class in org.apache.spark.storage
 
BroadcastBlockId(long, String) - Constructor for class org.apache.spark.storage.BroadcastBlockId
 
BroadcastFactory - Interface in org.apache.spark.broadcast
:: DeveloperApi :: An interface for all the broadcast implementations in Spark (to allow multiple broadcast implementations).
broadcastId() - Method in class org.apache.spark.CleanBroadcast
 
broadcastId() - Method in class org.apache.spark.storage.BroadcastBlockId
 
broadcastManager() - Method in class org.apache.spark.SparkEnv
 
Broker - Class in org.apache.spark.streaming.kafka
Represents the host and port info for a Kafka broker.
Bucketizer - Class in org.apache.spark.ml.feature
:: Experimental :: Bucketizer maps a column of continuous features to a column of feature buckets.
Bucketizer(String) - Constructor for class org.apache.spark.ml.feature.Bucketizer
 
Bucketizer() - Constructor for class org.apache.spark.ml.feature.Bucketizer
 
BufferReleasingInputStream - Class in org.apache.spark.storage
Helper class that ensures a ManagedBuffer is release upon InputStream.close()
BufferReleasingInputStream(InputStream, ShuffleBlockFetcherIterator) - Constructor for class org.apache.spark.storage.BufferReleasingInputStream
 
bufferSchema() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
A StructType represents data types of values in the aggregation buffer.
build() - Method in class org.apache.spark.ml.tuning.ParamGridBuilder
Builds and returns all combinations of parameters specified by the param grid.
build(Node[]) - Method in class org.apache.spark.mllib.tree.model.Node
build the left node and right nodes if not leaf
build() - Method in class org.apache.spark.sql.types.MetadataBuilder
Builds the Metadata instance.
buildFormattedString(DataType, String, StringBuilder) - Static method in class org.apache.spark.sql.types.DataType
 
buildJobStageDependencies(int, Seq<Object>) - Method in class org.apache.spark.scheduler.JobLogger
Build up the maps that represent stage-job relationships
buildScan(Seq<Attribute>, Seq<Expression>) - Method in interface org.apache.spark.sql.sources.CatalystScan
 
buildScan(FileStatus[]) - Method in class org.apache.spark.sql.sources.HadoopFsRelation
For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation.
buildScan(String[], FileStatus[]) - Method in class org.apache.spark.sql.sources.HadoopFsRelation
For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation.
buildScan(String[], Filter[], FileStatus[]) - Method in class org.apache.spark.sql.sources.HadoopFsRelation
For a non-partitioned relation, this method builds an RDD[Row] containing all rows within this relation.
buildScan(String[], Filter[]) - Method in interface org.apache.spark.sql.sources.PrunedFilteredScan
 
buildScan(String[]) - Method in interface org.apache.spark.sql.sources.PrunedScan
 
buildScan() - Method in interface org.apache.spark.sql.sources.TableScan
 
BYTE() - Static method in class org.apache.spark.sql.Encoders
An encoder for nullable byte type.
ByteDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
 
bytesRead() - Method in class org.apache.spark.status.api.v1.InputMetricDistributions
 
bytesRead() - Method in class org.apache.spark.status.api.v1.InputMetrics
 
bytesToBytesWritable(byte[]) - Static method in class org.apache.spark.SparkContext
 
bytesWritableConverter() - Static method in class org.apache.spark.SparkContext
 
bytesWritten() - Method in class org.apache.spark.status.api.v1.OutputMetricDistributions
 
bytesWritten() - Method in class org.apache.spark.status.api.v1.OutputMetrics
 
bytesWritten() - Method in class org.apache.spark.status.api.v1.ShuffleWriteMetrics
 
ByteType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The data type representing Byte values.
ByteType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the ByteType object.

C

cache() - Method in class org.apache.spark.api.java.JavaDoubleRDD
Persist this RDD with the default storage level (`MEMORY_ONLY`).
cache() - Method in class org.apache.spark.api.java.JavaPairRDD
Persist this RDD with the default storage level (`MEMORY_ONLY`).
cache() - Method in class org.apache.spark.api.java.JavaRDD
Persist this RDD with the default storage level (`MEMORY_ONLY`).
cache() - Method in class org.apache.spark.graphx.Graph
Caches the vertices and edges associated with this graph at the previously-specified target storage levels, which default to MEMORY_ONLY.
cache() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
Persists the edge partitions using `targetStorageLevel`, which defaults to MEMORY_ONLY.
cache() - Method in class org.apache.spark.graphx.impl.GraphImpl
 
cache() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
Persists the vertex partitions at `targetStorageLevel`, which defaults to MEMORY_ONLY.
cache() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
Caches the underlying RDD.
cache() - Method in class org.apache.spark.rdd.RDD
Persist this RDD with the default storage level (`MEMORY_ONLY`).
cache() - Method in class org.apache.spark.sql.DataFrame
Persist this DataFrame with the default storage level (MEMORY_AND_DISK).
cache() - Method in class org.apache.spark.sql.Dataset
Persist this Dataset with the default storage level (MEMORY_AND_DISK).
cache() - Method in class org.apache.spark.streaming.api.java.JavaDStream
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
cache() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
cache() - Method in class org.apache.spark.streaming.dstream.DStream
Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER)
cachedLeafStatuses() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
 
cacheManager() - Method in class org.apache.spark.SparkEnv
 
cacheManager() - Method in class org.apache.spark.sql.SQLContext
 
cacheTable(String) - Method in class org.apache.spark.sql.SQLContext
Caches the specified table in-memory.
calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.classification.LogisticCostFun
 
calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.regression.AFTCostFun
 
calculate(DenseVector<Object>) - Method in class org.apache.spark.ml.regression.LeastSquaresCostFun
 
calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
:: DeveloperApi :: information calculation for multiclass classification
calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Entropy
:: DeveloperApi :: variance calculation
calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
:: DeveloperApi :: information calculation for multiclass classification
calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Gini
:: DeveloperApi :: variance calculation
calculate(double[], double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
:: DeveloperApi :: information calculation for multiclass classification
calculate(double, double, double) - Method in interface org.apache.spark.mllib.tree.impurity.Impurity
:: DeveloperApi :: information calculation for regression
calculate(double[], double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
:: DeveloperApi :: information calculation for multiclass classification
calculate(double, double, double) - Static method in class org.apache.spark.mllib.tree.impurity.Variance
:: DeveloperApi :: variance calculation
CalendarIntervalType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The data type representing calendar time intervals.
CalendarIntervalType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the CalendarIntervalType object.
call(K, Iterator<V1>, Iterator<V2>) - Method in interface org.apache.spark.api.java.function.CoGroupFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.DoubleFlatMapFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.DoubleFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.FilterFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.FlatMapFunction
 
call(T1, T2) - Method in interface org.apache.spark.api.java.function.FlatMapFunction2
 
call(K, Iterator<V>) - Method in interface org.apache.spark.api.java.function.FlatMapGroupsFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.ForeachFunction
 
call(Iterator<T>) - Method in interface org.apache.spark.api.java.function.ForeachPartitionFunction
 
call(T1) - Method in interface org.apache.spark.api.java.function.Function
 
call() - Method in interface org.apache.spark.api.java.function.Function0
 
call(T1, T2) - Method in interface org.apache.spark.api.java.function.Function2
 
call(T1, T2, T3) - Method in interface org.apache.spark.api.java.function.Function3
 
call(T1, T2, T3, T4) - Method in interface org.apache.spark.api.java.function.Function4
 
call(T) - Method in interface org.apache.spark.api.java.function.MapFunction
 
call(K, Iterator<V>) - Method in interface org.apache.spark.api.java.function.MapGroupsFunction
 
call(Iterator<T>) - Method in interface org.apache.spark.api.java.function.MapPartitionsFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.PairFlatMapFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.PairFunction
 
call(T, T) - Method in interface org.apache.spark.api.java.function.ReduceFunction
 
call(T) - Method in interface org.apache.spark.api.java.function.VoidFunction
 
call(T1, T2) - Method in interface org.apache.spark.api.java.function.VoidFunction2
 
call(T1) - Method in interface org.apache.spark.sql.api.java.UDF1
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) - Method in interface org.apache.spark.sql.api.java.UDF10
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11) - Method in interface org.apache.spark.sql.api.java.UDF11
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12) - Method in interface org.apache.spark.sql.api.java.UDF12
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13) - Method in interface org.apache.spark.sql.api.java.UDF13
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14) - Method in interface org.apache.spark.sql.api.java.UDF14
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15) - Method in interface org.apache.spark.sql.api.java.UDF15
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16) - Method in interface org.apache.spark.sql.api.java.UDF16
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17) - Method in interface org.apache.spark.sql.api.java.UDF17
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18) - Method in interface org.apache.spark.sql.api.java.UDF18
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19) - Method in interface org.apache.spark.sql.api.java.UDF19
 
call(T1, T2) - Method in interface org.apache.spark.sql.api.java.UDF2
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20) - Method in interface org.apache.spark.sql.api.java.UDF20
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21) - Method in interface org.apache.spark.sql.api.java.UDF21
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22) - Method in interface org.apache.spark.sql.api.java.UDF22
 
call(T1, T2, T3) - Method in interface org.apache.spark.sql.api.java.UDF3
 
call(T1, T2, T3, T4) - Method in interface org.apache.spark.sql.api.java.UDF4
 
call(T1, T2, T3, T4, T5) - Method in interface org.apache.spark.sql.api.java.UDF5
 
call(T1, T2, T3, T4, T5, T6) - Method in interface org.apache.spark.sql.api.java.UDF6
 
call(T1, T2, T3, T4, T5, T6, T7) - Method in interface org.apache.spark.sql.api.java.UDF7
 
call(T1, T2, T3, T4, T5, T6, T7, T8) - Method in interface org.apache.spark.sql.api.java.UDF8
 
call(T1, T2, T3, T4, T5, T6, T7, T8, T9) - Method in interface org.apache.spark.sql.api.java.UDF9
 
callSite() - Method in class org.apache.spark.storage.RDDInfo
 
callUDF(String, Column...) - Static method in class org.apache.spark.sql.functions
Call an user-defined function.
callUDF(Function0<?>, DataType) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function1<?, ?>, DataType, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function2<?, ?, ?>, DataType, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function3<?, ?, ?, ?>, DataType, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function4<?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function5<?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function6<?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function7<?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function8<?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf() This will be removed in Spark 2.0.
callUDF(Function9<?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf(). This will be removed in Spark 2.0.
callUDF(Function10<?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?>, DataType, Column, Column, Column, Column, Column, Column, Column, Column, Column, Column) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it's redundant with udf(). This will be removed in Spark 2.0.
callUDF(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Call an user-defined function.
callUdf(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.5.0, since it was not coherent to have two functions callUdf and callUDF. This will be removed in Spark 2.0.
cancel() - Method in class org.apache.spark.ComplexFutureAction
 
cancel() - Method in interface org.apache.spark.FutureAction
Cancels the execution of this action.
cancel() - Method in class org.apache.spark.SimpleFutureAction
 
cancelAllJobs() - Method in class org.apache.spark.api.java.JavaSparkContext
Cancel all jobs that have been scheduled or are running.
cancelAllJobs() - Method in class org.apache.spark.SparkContext
Cancel all jobs that have been scheduled or are running.
cancelJobGroup(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Cancel active jobs for the specified group.
cancelJobGroup(String) - Method in class org.apache.spark.SparkContext
Cancel active jobs for the specified group.
canEqual(Object) - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
 
canEqual(Object) - Method in class org.apache.spark.util.MutablePair
 
canHandle(String) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
 
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.DB2Dialect
 
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
 
canHandle(String) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Check if this dialect instance can handle a certain jdbc url.
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
 
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
 
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.NoopDialect
 
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
 
canHandle(String) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
 
cartesian(JavaRDDLike<U, ?>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in this and b is in other.
cartesian(RDD<U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in this and b is in other.
caseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
whether to do a case sensitive comparison over the stop words Default: false
cast(DataType) - Method in class org.apache.spark.sql.Column
Casts the column to a different data type.
cast(String) - Method in class org.apache.spark.sql.Column
Casts the column to a different data type, using the canonical string representation of the type.
catalog() - Method in class org.apache.spark.sql.hive.HiveContext
 
catalog() - Method in class org.apache.spark.sql.SQLContext
 
CatalystScan - Interface in org.apache.spark.sql.sources
::Experimental:: An interface for experimenting with a more direct connection to the query planner.
Categorical() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
 
categoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
CategoricalSplit - Class in org.apache.spark.ml.tree
:: DeveloperApi :: Split which tests a categorical feature.
categories() - Method in class org.apache.spark.mllib.tree.model.Split
 
categoryMaps() - Method in class org.apache.spark.ml.feature.VectorIndexerModel
 
cbrt(Column) - Static method in class org.apache.spark.sql.functions
Computes the cube-root of the given value.
cbrt(String) - Static method in class org.apache.spark.sql.functions
Computes the cube-root of the given column.
ceil(Column) - Static method in class org.apache.spark.sql.functions
Computes the ceiling of the given value.
ceil(String) - Static method in class org.apache.spark.sql.functions
Computes the ceiling of the given column.
ceil() - Method in class org.apache.spark.sql.types.Decimal
 
changePrecision(int, int) - Method in class org.apache.spark.sql.types.Decimal
Update precision and scale while keeping our value the same, and return true if successful.
checkpoint() - Method in interface org.apache.spark.api.java.JavaRDDLike
Mark this RDD for checkpointing.
checkpoint() - Method in class org.apache.spark.graphx.Graph
Mark this Graph for checkpointing.
checkpoint() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
 
checkpoint() - Method in class org.apache.spark.graphx.impl.GraphImpl
 
checkpoint() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
checkpoint() - Method in class org.apache.spark.rdd.HadoopRDD
 
checkpoint() - Method in class org.apache.spark.rdd.RDD
Mark this RDD for checkpointing.
checkpoint(Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Enable periodic checkpointing of RDDs of this DStream.
checkpoint(String) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Sets the context to periodically checkpoint the DStream operations for master fault-tolerance.
checkpoint(Duration) - Method in class org.apache.spark.streaming.dstream.DStream
Enable periodic checkpointing of RDDs of this DStream
checkpoint(String) - Method in class org.apache.spark.streaming.StreamingContext
Set the context to periodically checkpoint the DStream operations for driver fault-tolerance.
checkpointData() - Method in class org.apache.spark.rdd.RDD
 
checkpointData() - Method in class org.apache.spark.streaming.dstream.DStream
 
checkpointDir() - Method in class org.apache.spark.SparkContext
 
checkpointDir() - Method in class org.apache.spark.streaming.StreamingContext
 
checkpointDuration() - Method in class org.apache.spark.streaming.dstream.DStream
 
checkpointDuration() - Method in class org.apache.spark.streaming.StreamingContext
 
checkpointFile(String) - Method in class org.apache.spark.api.java.JavaSparkContext
 
checkpointFile(String, ClassTag<T>) - Method in class org.apache.spark.SparkContext
 
checkpointInterval() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
 
checkpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
child() - Method in class org.apache.spark.sql.sources.Not
 
CHILD_CONNECTION_TIMEOUT - Static variable in class org.apache.spark.launcher.SparkLauncher
Maximum time (in ms) to wait for a child process to connect back to the launcher server when using @link{#start()}.
CHILD_PROCESS_LOGGER_NAME - Static variable in class org.apache.spark.launcher.SparkLauncher
Logger name to use when launching a child process.
ChiSqSelector - Class in org.apache.spark.ml.feature
:: Experimental :: Chi-Squared feature selection, which selects categorical features to use for predicting a categorical label.
ChiSqSelector(String) - Constructor for class org.apache.spark.ml.feature.ChiSqSelector
 
ChiSqSelector() - Constructor for class org.apache.spark.ml.feature.ChiSqSelector
 
ChiSqSelector - Class in org.apache.spark.mllib.feature
 
ChiSqSelector(int) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelector
 
ChiSqSelectorModel - Class in org.apache.spark.ml.feature
 
ChiSqSelectorModel - Class in org.apache.spark.mllib.feature
Chi Squared selector model.
ChiSqSelectorModel(int[]) - Constructor for class org.apache.spark.mllib.feature.ChiSqSelectorModel
 
chiSqTest(Vector, Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's chi-squared goodness of fit test of the observed data against the expected distribution.
chiSqTest(Vector) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform distribution, with each category having an expected frequency of 1 / observed.size.
chiSqTest(Matrix) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0.
chiSqTest(RDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
Conduct Pearson's independence test for every feature against the label across the input RDD.
chiSqTest(JavaRDD<LabeledPoint>) - Static method in class org.apache.spark.mllib.stat.Statistics
Java-friendly version of chiSqTest()
ChiSqTestResult - Class in org.apache.spark.mllib.stat.test
Object containing the test results for the chi-squared hypothesis test.
Classification() - Static method in class org.apache.spark.mllib.tree.configuration.Algo
 
ClassificationModel<FeaturesType,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
:: DeveloperApi ::
ClassificationModel() - Constructor for class org.apache.spark.ml.classification.ClassificationModel
 
ClassificationModel - Interface in org.apache.spark.mllib.classification
Represents a classification model that predicts to which of a set of categories an example belongs.
Classifier<FeaturesType,E extends Classifier<FeaturesType,E,M>,M extends ClassificationModel<FeaturesType,M>> - Class in org.apache.spark.ml.classification
:: DeveloperApi ::
Classifier() - Constructor for class org.apache.spark.ml.classification.Classifier
 
className() - Method in class org.apache.spark.ExceptionFailure
 
classpathEntries() - Method in class org.apache.spark.ui.env.EnvironmentListener
 
classTag() - Method in class org.apache.spark.api.java.JavaDoubleRDD
 
classTag() - Method in class org.apache.spark.api.java.JavaPairRDD
 
classTag() - Method in class org.apache.spark.api.java.JavaRDD
 
classTag() - Method in interface org.apache.spark.api.java.JavaRDDLike
 
classTag() - Method in class org.apache.spark.streaming.api.java.JavaDStream
 
classTag() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
 
classTag() - Method in class org.apache.spark.streaming.api.java.JavaInputDStream
 
classTag() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
 
classTag() - Method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
 
clean(long, boolean) - Method in class org.apache.spark.streaming.util.WriteAheadLog
Clean all the records that are older than the threshold time.
CleanAccum - Class in org.apache.spark
 
CleanAccum(long) - Constructor for class org.apache.spark.CleanAccum
 
CleanBroadcast - Class in org.apache.spark
 
CleanBroadcast(long) - Constructor for class org.apache.spark.CleanBroadcast
 
CleanCheckpoint - Class in org.apache.spark
 
CleanCheckpoint(int) - Constructor for class org.apache.spark.CleanCheckpoint
 
CleanRDD - Class in org.apache.spark
 
CleanRDD(int) - Constructor for class org.apache.spark.CleanRDD
 
CleanShuffle - Class in org.apache.spark
 
CleanShuffle(int) - Constructor for class org.apache.spark.CleanShuffle
 
CleanupTask - Interface in org.apache.spark
Classes that represent cleaning tasks.
CleanupTaskWeakReference - Class in org.apache.spark
A WeakReference associated with a CleanupTask.
CleanupTaskWeakReference(CleanupTask, Object, ReferenceQueue<Object>) - Constructor for class org.apache.spark.CleanupTaskWeakReference
 
clear(Param<?>) - Method in interface org.apache.spark.ml.param.Params
 
clear() - Method in class org.apache.spark.sql.util.ExecutionListenerManager
Removes all the registered QueryExecutionListener.
clearActive() - Static method in class org.apache.spark.sql.SQLContext
Clears the active SQLContext for current thread.
clearCache() - Method in class org.apache.spark.sql.SQLContext
Removes all cached tables from the in-memory cache.
clearCallSite() - Method in class org.apache.spark.api.java.JavaSparkContext
Pass-through to SparkContext.setCallSite.
clearCallSite() - Method in class org.apache.spark.SparkContext
Clear the thread-local property for overriding the call sites of actions and RDDs.
clearDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
 
clearDependencies() - Method in class org.apache.spark.rdd.RDD
Clears the dependencies of this RDD.
clearDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
 
clearDependencies() - Method in class org.apache.spark.rdd.UnionRDD
 
clearFiles() - Method in class org.apache.spark.api.java.JavaSparkContext
Clear the job's list of files added by addFile so that they do not get downloaded to any new nodes.
clearFiles() - Method in class org.apache.spark.SparkContext
Clear the job's list of files added by addFile so that they do not get downloaded to any new nodes.
clearJars() - Method in class org.apache.spark.api.java.JavaSparkContext
Clear the job's list of JARs added by addJar so that they do not get downloaded to any new nodes.
clearJars() - Method in class org.apache.spark.SparkContext
Clear the job's list of JARs added by addJar so that they do not get downloaded to any new nodes.
clearJobGroup() - Method in class org.apache.spark.api.java.JavaSparkContext
Clear the current thread's job group ID and its description.
clearJobGroup() - Method in class org.apache.spark.SparkContext
Clear the current thread's job group ID and its description.
clearThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
Clears the threshold so that predict will output raw prediction scores.
clearThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
Clears the threshold so that predict will output raw prediction scores.
clone() - Method in class org.apache.spark.SparkConf
Copy this object
clone() - Method in class org.apache.spark.sql.types.Decimal
 
clone() - Method in class org.apache.spark.storage.StorageLevel
 
clone() - Method in class org.apache.spark.util.random.BernoulliCellSampler
 
clone() - Method in class org.apache.spark.util.random.BernoulliSampler
 
clone() - Method in class org.apache.spark.util.random.PoissonSampler
 
clone() - Method in interface org.apache.spark.util.random.RandomSampler
return a copy of the RandomSampler object
cloneComplement() - Method in class org.apache.spark.util.random.BernoulliCellSampler
Return a sampler that is the complement of the range specified of the current sampler.
close() - Method in class org.apache.spark.api.java.JavaSparkContext
 
close() - Method in class org.apache.spark.input.PortableDataStream
Closing the PortableDataStream is not needed anymore.
close() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
 
close() - Method in class org.apache.spark.serializer.DeserializationStream
 
close() - Method in class org.apache.spark.serializer.SerializationStream
 
close() - Method in class org.apache.spark.sql.sources.OutputWriter
Closes the OutputWriter.
close() - Method in class org.apache.spark.storage.BufferReleasingInputStream
 
close() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
 
close() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
 
close() - Method in class org.apache.spark.streaming.util.WriteAheadLog
Close this log and release any resources.
closeLogWriter(int) - Method in class org.apache.spark.scheduler.JobLogger
Close log file, and clean the stage relationship in stageIdToJobId
closureSerializer() - Method in class org.apache.spark.SparkEnv
 
cls() - Method in class org.apache.spark.util.MethodIdentifier
 
clsTag() - Method in interface org.apache.spark.sql.Encoder
A ClassTag that can be used to construct and Array to contain a collection of `T`.
cluster() - Method in class org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
 
clusterCenters() - Method in class org.apache.spark.ml.clustering.KMeansModel
 
clusterCenters() - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Leaf cluster centers.
clusterCenters() - Method in class org.apache.spark.mllib.clustering.KMeansModel
 
clusterCenters() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
 
clusterWeights() - Method in class org.apache.spark.mllib.clustering.StreamingKMeansModel
 
cn() - Method in class org.apache.spark.mllib.feature.VocabWord
 
coalesce(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int) - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int) - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int, boolean) - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int, boolean, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Return a new RDD that is reduced into numPartitions partitions.
coalesce(int) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame that has exactly numPartitions partitions.
coalesce(int) - Method in class org.apache.spark.sql.Dataset
Returns a new Dataset that has exactly numPartitions partitions.
coalesce(Column...) - Static method in class org.apache.spark.sql.functions
Returns the first column that is not null, or null if all inputs are null.
coalesce(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Returns the first column that is not null, or null if all inputs are null.
code() - Method in class org.apache.spark.mllib.feature.VocabWord
 
codeLen() - Method in class org.apache.spark.mllib.feature.VocabWord
 
coefficients() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
 
coefficients() - Method in class org.apache.spark.ml.regression.AFTSurvivalRegressionModel
 
coefficients() - Method in class org.apache.spark.ml.regression.LinearRegressionModel
 
coefficientStandardErrors() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
Standard error of estimated coefficients and intercept.
cogroup(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other, return a resulting RDD that contains a tuple with the list of values for that key in this as well as other.
cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other1 or other2, return a resulting RDD that contains a tuple with the list of values for that key in this, other1 and other2.
cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other1 or other2 or other3, return a resulting RDD that contains a tuple with the list of values for that key in this, other1, other2 and other3.
cogroup(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other, return a resulting RDD that contains a tuple with the list of values for that key in this as well as other.
cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other1 or other2, return a resulting RDD that contains a tuple with the list of values for that key in this, other1 and other2.
cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other1 or other2 or other3, return a resulting RDD that contains a tuple with the list of values for that key in this, other1, other2 and other3.
cogroup(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other, return a resulting RDD that contains a tuple with the list of values for that key in this as well as other.
cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other1 or other2, return a resulting RDD that contains a tuple with the list of values for that key in this, other1 and other2.
cogroup(JavaPairRDD<K, W1>, JavaPairRDD<K, W2>, JavaPairRDD<K, W3>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
For each key k in this or other1 or other2 or other3, return a resulting RDD that contains a tuple with the list of values for that key in this, other1, other2 and other3.
cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this or other1 or other2 or other3, return a resulting RDD that contains a tuple with the list of values for that key in this, other1, other2 and other3.
cogroup(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
 
cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
 
cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
 
cogroup(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this or other, return a resulting RDD that contains a tuple with the list of values for that key in this as well as other.
cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this or other1 or other2, return a resulting RDD that contains a tuple with the list of values for that key in this, other1 and other2.
cogroup(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this or other, return a resulting RDD that contains a tuple with the list of values for that key in this as well as other.
cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this or other1 or other2, return a resulting RDD that contains a tuple with the list of values for that key in this, other1 and other2.
cogroup(RDD<Tuple2<K, W1>>, RDD<Tuple2<K, W2>>, RDD<Tuple2<K, W3>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
For each key k in this or other1 or other2 or other3, return a resulting RDD that contains a tuple with the list of values for that key in this, other1, other2 and other3.
cogroup(GroupedDataset<K, U>, Function3<K, Iterator<V>, Iterator<U>, TraversableOnce<R>>, Encoder<R>) - Method in class org.apache.spark.sql.GroupedDataset
Applies the given function to each cogrouped data.
cogroup(GroupedDataset<K, U>, CoGroupFunction<K, V, U, R>, Encoder<R>) - Method in class org.apache.spark.sql.GroupedDataset
Applies the given function to each cogrouped data.
cogroup(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.
cogroup(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.
cogroup(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.
cogroup(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.
cogroup(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.
cogroup(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'cogroup' between RDDs of this DStream and other DStream.
CoGroupedRDD<K> - Class in org.apache.spark.rdd
:: DeveloperApi :: A RDD that cogroups its parents.
CoGroupedRDD(Seq<RDD<? extends Product2<K, ?>>>, Partitioner, ClassTag<K>) - Constructor for class org.apache.spark.rdd.CoGroupedRDD
 
CoGroupFunction<K,V1,V2,R> - Interface in org.apache.spark.api.java.function
A function that returns zero or more output records from each grouping key and its values from 2 Datasets.
col(String) - Method in class org.apache.spark.sql.DataFrame
Selects column based on the column name and return it as a Column.
col(String) - Static method in class org.apache.spark.sql.functions
Returns a Column based on the given column name.
collect() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return an array that contains all of the elements in this RDD.
collect() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
 
collect() - Method in class org.apache.spark.rdd.RDD
Return an array that contains all of the elements in this RDD.
collect(PartialFunction<T, U>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Return an RDD that contains all matching values by applying f.
collect() - Method in class org.apache.spark.sql.DataFrame
Returns an array that contains all of Rows in this DataFrame.
collect() - Method in class org.apache.spark.sql.Dataset
Returns an array that contains all the elements in this Dataset.
collect_list(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a list of objects with duplicates.
collect_list(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a list of objects with duplicates.
collect_set(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a set of objects with duplicate elements eliminated.
collect_set(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns a set of objects with duplicate elements eliminated.
collectAsList() - Method in class org.apache.spark.sql.DataFrame
Returns a Java list that contains all of Rows in this DataFrame.
collectAsList() - Method in class org.apache.spark.sql.Dataset
Returns an array that contains all the elements in this Dataset.
collectAsMap() - Method in class org.apache.spark.api.java.JavaPairRDD
Return the key-value pairs in this RDD to the master as a Map.
collectAsMap() - Method in class org.apache.spark.rdd.PairRDDFunctions
Return the key-value pairs in this RDD to the master as a Map.
collectAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
The asynchronous version of collect, which returns a future for retrieving an array containing all of the elements in this RDD.
collectAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
Returns a future for retrieving all elements of this RDD.
collectEdges(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
Returns an RDD that contains for each vertex v its local edges, i.e., the edges that are incident on v, in the user-specified direction.
collectNeighborIds(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
Collect the neighbor vertex ids for each vertex.
collectNeighbors(EdgeDirection) - Method in class org.apache.spark.graphx.GraphOps
Collect the neighbor vertex attributes for each vertex.
collectPartitions(int[]) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return an array that contains all of the elements in a specific partition of this RDD.
collectToPython() - Method in class org.apache.spark.sql.DataFrame
 
colPtrs() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
 
colsPerBlock() - Method in class org.apache.spark.mllib.linalg.distributed.BlockMatrix
 
colStats(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
Computes column-wise summary statistics for the input RDD[Vector].
Column - Class in org.apache.spark.sql
:: Experimental :: A column that will be computed based on the data in a DataFrame.
Column(Expression) - Constructor for class org.apache.spark.sql.Column
 
Column(String) - Constructor for class org.apache.spark.sql.Column
 
column(String) - Static method in class org.apache.spark.sql.functions
Returns a Column based on the given column name.
ColumnName - Class in org.apache.spark.sql
:: Experimental :: A convenient class used for constructing schema.
ColumnName(String) - Constructor for class org.apache.spark.sql.ColumnName
 
ColumnPruner - Class in org.apache.spark.ml.feature
Utility transformer for removing temporary columns from a DataFrame.
ColumnPruner(Set<String>) - Constructor for class org.apache.spark.ml.feature.ColumnPruner
 
columns() - Method in class org.apache.spark.sql.DataFrame
Returns all column names as an array.
columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
 
columnSimilarities() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Compute all cosine similarities between columns of this matrix using the brute-force approach of computing normalized dot products.
columnSimilarities(double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Compute similarities between columns of this matrix using a sampling approach.
combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.api.java.JavaPairRDD
Generic function to combine the elements for each key using a custom set of aggregation functions.
combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
Generic function to combine the elements for each key using a custom set of aggregation functions.
combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
Simplified version of combineByKey that hash-partitions the output RDD and uses map-side aggregation.
combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.api.java.JavaPairRDD
Simplified version of combineByKey that hash-partitions the resulting RDD using the existing partitioner/parallelism level and using map-side aggregation.
combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer) - Method in class org.apache.spark.rdd.PairRDDFunctions
Generic function to combine the elements for each key using a custom set of aggregation functions.
combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD.
combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
 
combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Combine elements of each key in DStream's RDDs using custom function.
combineByKey(Function<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Combine elements of each key in DStream's RDDs using custom function.
combineByKey(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, ClassTag<C>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Combine elements of each key in DStream's RDDs using custom functions.
combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, Partitioner, boolean, Serializer, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
:: Experimental :: Generic function to combine the elements for each key using a custom set of aggregation functions.
combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, int, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
:: Experimental :: Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD.
combineByKeyWithClassTag(Function1<V, C>, Function2<C, V, C>, Function2<C, C, C>, ClassTag<C>) - Method in class org.apache.spark.rdd.PairRDDFunctions
:: Experimental :: Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the existing partitioner/parallelism level.
combineCombinersByKey(Iterator<Product2<K, C>>) - Method in class org.apache.spark.Aggregator
 
combineCombinersByKey(Iterator<Product2<K, C>>, TaskContext) - Method in class org.apache.spark.Aggregator
 
combinerClassName() - Method in class org.apache.spark.ShuffleDependency
 
combineValuesByKey(Iterator<Product2<K, V>>) - Method in class org.apache.spark.Aggregator
 
combineValuesByKey(Iterator<Product2<K, V>>, TaskContext) - Method in class org.apache.spark.Aggregator
 
compare(PartitionGroup, PartitionGroup) - Method in class org.apache.spark.rdd.PartitionCoalescer
 
compare(Option<PartitionGroup>, Option<PartitionGroup>) - Method in class org.apache.spark.rdd.PartitionCoalescer
 
compare(Decimal) - Method in class org.apache.spark.sql.types.Decimal
 
compare(RDDInfo) - Method in class org.apache.spark.storage.RDDInfo
 
compareTo(SparkShutdownHook) - Method in class org.apache.spark.util.SparkShutdownHook
 
completed() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
 
completedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
completedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
completedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
 
completionTime() - Method in class org.apache.spark.scheduler.StageInfo
Time when all tasks in the stage completed or when the stage was cancelled.
completionTime() - Method in class org.apache.spark.status.api.v1.JobData
 
ComplexFutureAction<T> - Class in org.apache.spark
A FutureAction for actions that could trigger multiple Spark jobs.
ComplexFutureAction() - Constructor for class org.apache.spark.ComplexFutureAction
 
compressed() - Method in interface org.apache.spark.mllib.linalg.Vector
Returns a vector in either dense or sparse format, whichever uses less storage.
compressedInputStream(InputStream) - Method in interface org.apache.spark.io.CompressionCodec
 
compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
 
compressedInputStream(InputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
 
compressedInputStream(InputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
 
compressedOutputStream(OutputStream) - Method in interface org.apache.spark.io.CompressionCodec
 
compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZ4CompressionCodec
 
compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.LZFCompressionCodec
 
compressedOutputStream(OutputStream) - Method in class org.apache.spark.io.SnappyCompressionCodec
 
CompressionCodec - Interface in org.apache.spark.io
:: DeveloperApi :: CompressionCodec allows the customization of choosing different compression implementations to be used in block storage.
compute(Partition, TaskContext) - Method in class org.apache.spark.api.r.BaseRRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.EdgeRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.graphx.VertexRDD
Provides the RDD[(VertexId, VD)] equivalent output.
compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
Compute the gradient and loss given the features of a single data point.
compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.Gradient
Compute the gradient and loss given the features of a single data point, add the gradient to a provided vector to avoid creating new objects, and return loss.
compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
 
compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.HingeGradient
 
compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.L1Updater
 
compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
 
compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LeastSquaresGradient
 
compute(Vector, double, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
 
compute(Vector, double, Vector, Vector) - Method in class org.apache.spark.mllib.optimization.LogisticGradient
 
compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SimpleUpdater
 
compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.SquaredL2Updater
 
compute(Vector, Vector, double, int, double) - Method in class org.apache.spark.mllib.optimization.Updater
Compute an updated value for weights given the gradient, stepSize, iteration number and regularization parameter.
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.CoGroupedRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.HadoopRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.JdbcRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.NewHadoopRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.PartitionPruningRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.RDD
:: DeveloperApi :: Implemented by subclasses to compute a given partition.
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.ShuffledRDD
 
compute(Partition, TaskContext) - Method in class org.apache.spark.rdd.UnionRDD
 
compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaDStream
Generate an RDD for the given duration
compute(Time) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Method that generates a RDD for the given Duration
compute(Time) - Method in class org.apache.spark.streaming.dstream.ConstantInputDStream
 
compute(Time) - Method in class org.apache.spark.streaming.dstream.DStream
Method that generates a RDD for the given time
compute(Time) - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
 
computeColumnSummaryStatistics() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes column-wise summary statistics.
computeCost(DataFrame) - Method in class org.apache.spark.ml.clustering.KMeansModel
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
computeCost(Vector) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Computes the squared distance between the input point and the cluster center it belongs to.
computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Computes the sum of squared distances between the input points and their corresponding cluster centers.
computeCost(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.clustering.BisectingKMeansModel
Java-friendly version of computeCost().
computeCost(RDD<Vector>) - Method in class org.apache.spark.mllib.clustering.KMeansModel
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
computeCovariance() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the covariance matrix, treating each row as an observation.
computeError(org.apache.spark.mllib.tree.model.TreeEnsembleModel, RDD<LabeledPoint>) - Method in interface org.apache.spark.mllib.tree.loss.Loss
Method to calculate error of the base learner for the gradient boosting calculation.
computeError(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
Method to calculate loss when the predictions are already known.
computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
 
computeGramianMatrix() - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the Gramian matrix A^T A.
computeInitialPredictionAndError(RDD<LabeledPoint>, double, DecisionTreeModel, Loss) - Static method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
:: DeveloperApi :: Compute the initial predictions and errors for a dataset for the first iteration of gradient boosting.
computePreferredLocations(Seq<InputFormatInfo>) - Static method in class org.apache.spark.scheduler.InputFormatInfo
Computes the preferred locations based on input(s) and returned a location to block map.
computePrincipalComponents(int) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes the top k principal components.
computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
 
computeSVD(int, boolean, double) - Method in class org.apache.spark.mllib.linalg.distributed.RowMatrix
Computes singular value decomposition of this matrix.
concat(Column...) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input string columns together into a single string column.
concat(Seq<Column>) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input string columns together into a single string column.
concat_ws(String, Column...) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input string columns together into a single string column, using the given separator.
concat_ws(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Concatenates multiple input string columns together into a single string column, using the given separator.
conf() - Method in class org.apache.spark.SparkEnv
 
conf() - Method in class org.apache.spark.sql.hive.HiveContext
 
conf() - Method in class org.apache.spark.sql.SQLContext
 
conf() - Method in class org.apache.spark.streaming.StreamingContext
 
confidence() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
Returns the confidence of the rule.
confidence() - Method in class org.apache.spark.partial.BoundedDouble
 
configuration() - Method in class org.apache.spark.scheduler.InputFormatInfo
 
CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.HadoopRDD
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
CONFIGURATION_INSTANTIATION_LOCK() - Static method in class org.apache.spark.rdd.NewHadoopRDD
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
configure() - Method in class org.apache.spark.sql.hive.HiveContext
Overridden by child classes that need to set configuration before the client init.
confusionMatrix() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns confusion matrix: predicted classes are in columns, they are ordered by class label ascending, as in "labels"
connectedComponents() - Method in class org.apache.spark.graphx.GraphOps
Compute the connected component membership of each vertex and return a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.
ConnectedComponents - Class in org.apache.spark.graphx.lib
Connected components algorithm.
ConnectedComponents() - Constructor for class org.apache.spark.graphx.lib.ConnectedComponents
 
consequent() - Method in class org.apache.spark.mllib.fpm.AssociationRules.Rule
 
ConstantInputDStream<T> - Class in org.apache.spark.streaming.dstream
An input stream that always returns the same RDD on each timestep.
ConstantInputDStream(StreamingContext, RDD<T>, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.ConstantInputDStream
 
contains(Param<?>) - Method in class org.apache.spark.ml.param.ParamMap
Checks whether a parameter is explicitly specified.
contains(String) - Method in class org.apache.spark.SparkConf
Does the configuration contain a given parameter?
contains(Object) - Method in class org.apache.spark.sql.Column
Contains the other element.
contains(String) - Method in class org.apache.spark.sql.types.Metadata
Tests whether this Metadata contains a binding for a key.
containsBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
Return whether the given block is stored in this block manager in O(1) time.
containsCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
 
containsNull() - Method in class org.apache.spark.sql.types.ArrayType
 
context() - Method in interface org.apache.spark.api.java.JavaRDDLike
The SparkContext that this RDD was created on.
context() - Method in class org.apache.spark.InterruptibleIterator
 
context(SQLContext) - Method in class org.apache.spark.ml.util.MLReader
 
context(SQLContext) - Method in class org.apache.spark.ml.util.MLWriter
 
context() - Method in class org.apache.spark.rdd.RDD
The SparkContext that this RDD was created on.
context() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return the StreamingContext associated with this DStream
context() - Method in class org.apache.spark.streaming.dstream.DStream
Return the StreamingContext associated with this DStream
Continuous() - Static method in class org.apache.spark.mllib.tree.configuration.FeatureType
 
ContinuousSplit - Class in org.apache.spark.ml.tree
:: DeveloperApi :: Split which tests a continuous feature.
conv(Column, int, int) - Static method in class org.apache.spark.sql.functions
Convert a number in a string column from one base to another.
CONVERT_CTAS() - Static method in class org.apache.spark.sql.hive.HiveContext
 
CONVERT_METASTORE_PARQUET() - Static method in class org.apache.spark.sql.hive.HiveContext
 
CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING() - Static method in class org.apache.spark.sql.hive.HiveContext
 
convertCTAS() - Method in class org.apache.spark.sql.hive.HiveContext
When true, a table created by a Hive CTAS statement (no USING clause) will be converted to a data source table, using the data source set by spark.sql.sources.default.
convertMetastoreParquet() - Method in class org.apache.spark.sql.hive.HiveContext
When true, enables an experimental feature where metastore tables that use the parquet SerDe are automatically converted to use the Spark SQL parquet table scan, instead of the Hive SerDe.
convertMetastoreParquetWithSchemaMerging() - Method in class org.apache.spark.sql.hive.HiveContext
When true, also tries to merge possibly different but compatible Parquet schemas in different Parquet data files.
convertToCanonicalEdges(Function2<ED, ED, ED>) - Method in class org.apache.spark.graphx.GraphOps
Convert bi-directional edges into uni-directional ones.
CoordinateMatrix - Class in org.apache.spark.mllib.linalg.distributed
 
CoordinateMatrix(RDD<MatrixEntry>, long, long) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
 
CoordinateMatrix(RDD<MatrixEntry>) - Constructor for class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.DecisionTreeClassifier
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassificationModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.GBTClassifier
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegression
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.MultilayerPerceptronClassifier
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayes
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.NaiveBayesModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRest
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.OneVsRestModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.classification.RandomForestClassifier
 
copy(ParamMap) - Method in class org.apache.spark.ml.clustering.DistributedLDAModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeans
 
copy(ParamMap) - Method in class org.apache.spark.ml.clustering.KMeansModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.clustering.LDA
 
copy(ParamMap) - Method in class org.apache.spark.ml.clustering.LocalLDAModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.Estimator
 
copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
 
copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.Evaluator
 
copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
 
copy(ParamMap) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.Binarizer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.Bucketizer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.ChiSqSelector
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.ChiSqSelectorModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.ColumnPruner
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.CountVectorizerModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.HashingTF
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDF
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.IDFModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.IndexToString
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.Interaction
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScaler
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.MinMaxScalerModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.OneHotEncoder
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCA
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.PCAModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.PolynomialExpansion
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.QuantileDiscretizer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.RegexTokenizer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormula
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.RFormulaModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.SQLTransformer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScaler
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.StandardScalerModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.StopWordsRemover
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.StringIndexerModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.Tokenizer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAssembler
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorAttributeRewriter
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorIndexerModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.VectorSlicer
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2Vec
 
copy(ParamMap) - Method in class org.apache.spark.ml.feature.Word2VecModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.Model
 
copy() - Method in class org.apache.spark.ml.param.ParamMap
Creates a copy of this param map.
copy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
Creates a copy of this instance with the same UID and some extra params.
copy(ParamMap) - Method in class org.apache.spark.ml.Pipeline
 
copy(ParamMap) - Method in class org.apache.spark.ml.PipelineModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.PipelineStage
 
copy(ParamMap) - Method in class org.apache.spark.ml.Predictor
 
copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALS
 
copy(ParamMap) - Method in class org.apache.spark.ml.recommendation.ALSModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegression
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.DecisionTreeRegressor
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.GBTRegressor
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegression
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.IsotonicRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegression
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.LinearRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.regression.RandomForestRegressor
 
copy(ParamMap) - Method in class org.apache.spark.ml.Transformer
 
copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidator
 
copy(ParamMap) - Method in class org.apache.spark.ml.tuning.CrossValidatorModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
 
copy(ParamMap) - Method in class org.apache.spark.ml.tuning.TrainValidationSplitModel
 
copy(ParamMap) - Method in class org.apache.spark.ml.UnaryTransformer
 
copy() - Method in class org.apache.spark.mllib.linalg.DenseMatrix
 
copy() - Method in class org.apache.spark.mllib.linalg.DenseVector
 
copy() - Method in interface org.apache.spark.mllib.linalg.Matrix
Get a deep copy of the matrix.
copy() - Method in class org.apache.spark.mllib.linalg.SparseMatrix
 
copy() - Method in class org.apache.spark.mllib.linalg.SparseVector
 
copy() - Method in interface org.apache.spark.mllib.linalg.Vector
Makes a deep copy of this vector.
copy() - Method in class org.apache.spark.mllib.random.ExponentialGenerator
 
copy() - Method in class org.apache.spark.mllib.random.GammaGenerator
 
copy() - Method in class org.apache.spark.mllib.random.LogNormalGenerator
 
copy() - Method in class org.apache.spark.mllib.random.PoissonGenerator
 
copy() - Method in interface org.apache.spark.mllib.random.RandomDataGenerator
Returns a copy of the RandomDataGenerator with a new instance of the rng object used in the class when applicable for non-locking concurrent usage.
copy() - Method in class org.apache.spark.mllib.random.StandardNormalGenerator
 
copy() - Method in class org.apache.spark.mllib.random.UniformGenerator
 
copy() - Method in class org.apache.spark.mllib.random.WeibullGenerator
 
copy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
Returns a shallow copy of this instance.
copy() - Method in interface org.apache.spark.sql.Row
Make a copy of the current Row object.
copy() - Method in class org.apache.spark.util.StatCounter
Clone this StatCounter
copyValues(T, ParamMap) - Method in interface org.apache.spark.ml.param.Params
Copies param values from this instance to another instance for params shared by them.
coresGranted() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
 
coresPerExecutor() - Method in class org.apache.spark.status.api.v1.ApplicationInfo
 
corr(RDD<Vector>) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the Pearson correlation matrix for the input RDD of Vectors.
corr(RDD<Vector>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the correlation matrix for the input RDD of Vectors using the specified method.
corr(RDD<Object>, RDD<Object>) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the Pearson correlation for the input RDDs.
corr(JavaRDD<Double>, JavaRDD<Double>) - Static method in class org.apache.spark.mllib.stat.Statistics
Java-friendly version of corr()
corr(RDD<Object>, RDD<Object>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
Compute the correlation for the input RDDs using the specified method.
corr(JavaRDD<Double>, JavaRDD<Double>, String) - Static method in class org.apache.spark.mllib.stat.Statistics
Java-friendly version of corr()
corr(String, String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculates the correlation of two columns of a DataFrame.
corr(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculates the Pearson Correlation Coefficient of two columns of a DataFrame.
corr(Column, Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
corr(String, String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
cos(Column) - Static method in class org.apache.spark.sql.functions
Computes the cosine of the given value.
cos(String) - Static method in class org.apache.spark.sql.functions
Computes the cosine of the given column.
cosh(Column) - Static method in class org.apache.spark.sql.functions
Computes the hyperbolic cosine of the given value.
cosh(String) - Static method in class org.apache.spark.sql.functions
Computes the hyperbolic cosine of the given column.
count() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the number of elements in the RDD.
count() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
The number of edges in the RDD.
count() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
The number of vertices in the RDD.
count() - Method in class org.apache.spark.ml.regression.AFTAggregator
 
count() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
 
count() - Method in class org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
Sample size.
count() - Method in interface org.apache.spark.mllib.stat.MultivariateStatisticalSummary
Sample size.
count() - Method in class org.apache.spark.rdd.RDD
Return the number of elements in the RDD.
count() - Method in class org.apache.spark.sql.DataFrame
Returns the number of rows in the DataFrame.
count() - Method in class org.apache.spark.sql.Dataset
Returns the number of elements in the Dataset.
count(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of items in a group.
count(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of items in a group.
count() - Method in class org.apache.spark.sql.GroupedData
Count the number of rows for each group.
count() - Method in class org.apache.spark.sql.GroupedDataset
Returns a Dataset that contains a tuple with each key and the number of items present for that key.
count() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream.
count() - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream.
count() - Method in class org.apache.spark.streaming.kafka.OffsetRange
Number of messages this OffsetRange refers to
count() - Method in class org.apache.spark.util.StatCounter
 
countApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
countApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
countApprox(long, double) - Method in class org.apache.spark.rdd.RDD
Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
countApproxDistinct(double) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return approximate number of distinct elements in the RDD.
countApproxDistinct(int, int) - Method in class org.apache.spark.rdd.RDD
Return approximate number of distinct elements in the RDD.
countApproxDistinct(double) - Method in class org.apache.spark.rdd.RDD
Return approximate number of distinct elements in the RDD.
countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
Return approximate number of distinct values for each key in this RDD.
countApproxDistinctByKey(double, int) - Method in class org.apache.spark.api.java.JavaPairRDD
Return approximate number of distinct values for each key in this RDD.
countApproxDistinctByKey(double) - Method in class org.apache.spark.api.java.JavaPairRDD
Return approximate number of distinct values for each key in this RDD.
countApproxDistinctByKey(int, int, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
countApproxDistinctByKey(double, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
countApproxDistinctByKey(double, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
countApproxDistinctByKey(double) - Method in class org.apache.spark.rdd.PairRDDFunctions
Return approximate number of distinct values for each key in this RDD.
countAsync() - Method in interface org.apache.spark.api.java.JavaRDDLike
The asynchronous version of count, which returns a future for counting the number of elements in this RDD.
countAsync() - Method in class org.apache.spark.rdd.AsyncRDDActions
Returns a future for counting the number of elements in the RDD.
countByKey() - Method in class org.apache.spark.api.java.JavaPairRDD
Count the number of elements for each key, and return the result to the master as a Map.
countByKey() - Method in class org.apache.spark.rdd.PairRDDFunctions
Count the number of elements for each key, collecting the results to a local Map.
countByKeyApprox(long) - Method in class org.apache.spark.api.java.JavaPairRDD
Approximate version of countByKey that can return a partial result if it does not finish within a timeout.
countByKeyApprox(long, double) - Method in class org.apache.spark.api.java.JavaPairRDD
Approximate version of countByKey that can return a partial result if it does not finish within a timeout.
countByKeyApprox(long, double) - Method in class org.apache.spark.rdd.PairRDDFunctions
Approximate version of countByKey that can return a partial result if it does not finish within a timeout.
countByValue() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the count of each unique value in this RDD as a map of (value, count) pairs.
countByValue(Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Return the count of each unique value in this RDD as a local map of (value, count) pairs.
countByValue() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream.
countByValue(int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream.
countByValue(int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream.
countByValueAndWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.
countByValueAndWindow(Duration, Duration, int) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.
countByValueAndWindow(Duration, Duration, int, Ordering<T>) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.
countByValueApprox(long, double) - Method in interface org.apache.spark.api.java.JavaRDDLike
(Experimental) Approximate version of countByValue().
countByValueApprox(long) - Method in interface org.apache.spark.api.java.JavaRDDLike
(Experimental) Approximate version of countByValue().
countByValueApprox(long, double, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Approximate version of countByValue().
countByWindow(Duration, Duration) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream.
countByWindow(Duration, Duration) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD has a single element generated by counting the number of elements in a sliding window over this DStream.
countDistinct(Column, Column...) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
countDistinct(String, String...) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
countDistinct(Column, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
countDistinct(String, Seq<String>) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the number of distinct items in a group.
countTowardsTaskFailures() - Method in class org.apache.spark.ExecutorLostFailure
 
countTowardsTaskFailures() - Method in class org.apache.spark.TaskCommitDenied
If a task failed because its attempt to commit was denied, do not count this failure towards failing the stage.
countTowardsTaskFailures() - Method in interface org.apache.spark.TaskFailedReason
Whether this task failure should be counted towards the maximum number of times the task is allowed to fail before the stage is aborted.
CountVectorizer - Class in org.apache.spark.ml.feature
:: Experimental :: Extracts a vocabulary from document collections and generates a CountVectorizerModel.
CountVectorizer(String) - Constructor for class org.apache.spark.ml.feature.CountVectorizer
 
CountVectorizer() - Constructor for class org.apache.spark.ml.feature.CountVectorizer
 
CountVectorizerModel - Class in org.apache.spark.ml.feature
:: Experimental :: Converts a text document to a sparse vector of token counts.
CountVectorizerModel(String, String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
 
CountVectorizerModel(String[]) - Constructor for class org.apache.spark.ml.feature.CountVectorizerModel
 
cov(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Calculate the sample covariance of two numerical columns of a DataFrame.
crc32(Column) - Static method in class org.apache.spark.sql.functions
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
CreatableRelationProvider - Interface in org.apache.spark.sql.sources
 
create(boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
Deprecated.
create(boolean, boolean, boolean, boolean, int) - Static method in class org.apache.spark.api.java.StorageLevels
Create a new StorageLevel object.
create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int, Function<ResultSet, T>) - Static method in class org.apache.spark.rdd.JdbcRDD
Create an RDD that executes an SQL query on a JDBC connection and reads results.
create(JavaSparkContext, JdbcRDD.ConnectionFactory, String, long, long, int) - Static method in class org.apache.spark.rdd.JdbcRDD
Create an RDD that executes an SQL query on a JDBC connection and reads results.
create(RDD<T>, Function1<Object, Object>) - Static method in class org.apache.spark.rdd.PartitionPruningRDD
Create a PartitionPruningRDD.
create(Object...) - Static method in class org.apache.spark.sql.RowFactory
Create a Row from the given arguments.
create() - Method in interface org.apache.spark.streaming.api.java.JavaStreamingContextFactory
 
create(String, int) - Static method in class org.apache.spark.streaming.kafka.Broker
 
create(String, int, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
 
create(TopicAndPartition, long, long) - Static method in class org.apache.spark.streaming.kafka.OffsetRange
 
createArrayType(DataType) - Static method in class org.apache.spark.sql.types.DataTypes
Creates an ArrayType by specifying the data type of elements (elementType).
createArrayType(DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
Creates an ArrayType by specifying the data type of elements (elementType) and whether the array contains null values (containsNull).
createCombiner() - Method in class org.apache.spark.Aggregator
 
createDataFrame(RDD<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(Seq<A>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(RDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(JavaRDD<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(List<Row>, StructType) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(RDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(JavaRDD<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
 
createDataFrame(List<?>, Class<?>) - Method in class org.apache.spark.sql.SQLContext
 
createDataset(Seq<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
 
createDataset(RDD<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
 
createDataset(List<T>, Encoder<T>) - Method in class org.apache.spark.sql.SQLContext
 
createDecimalType(int, int) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a DecimalType by specifying the precision and scale.
createDecimalType() - Static method in class org.apache.spark.sql.types.DataTypes
Creates a DecimalType with default precision and scale, which are 10 and 0.
createDirectStream(StreamingContext, Map<String, String>, Map<TopicAndPartition, Object>, Function1<MessageAndMetadata<K, V>, R>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>, ClassTag<R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
createDirectStream(StreamingContext, Map<String, String>, Set<String>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
createDirectStream(JavaStreamingContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Class<R>, Map<String, String>, Map<TopicAndPartition, Long>, Function<MessageAndMetadata<K, V>, R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
createDirectStream(JavaStreamingContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Map<String, String>, Set<String>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
createExternalTable(String, String) - Method in class org.apache.spark.sql.SQLContext
 
createExternalTable(String, String, String) - Method in class org.apache.spark.sql.SQLContext
 
createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
 
createExternalTable(String, String, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
 
createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
 
createExternalTable(String, String, StructType, Map<String, String>) - Method in class org.apache.spark.sql.SQLContext
 
createJDBCTable(String, String, boolean) - Method in class org.apache.spark.sql.DataFrame
Deprecated.
As of 1.340, replaced by write().jdbc(). This will be removed in Spark 2.0.
createLogDir() - Method in class org.apache.spark.scheduler.JobLogger
Create a folder for log files, the folder's name is the creation time of jobLogger
createLogWriter(int) - Method in class org.apache.spark.scheduler.JobLogger
Create a log file for one job
createMapType(DataType, DataType) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a MapType by specifying the data type of keys (keyType) and values (keyType).
createMapType(DataType, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a MapType by specifying the data type of keys (keyType), the data type of values (keyType), and whether values contain any null value (valueContainsNull).
createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
 
createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.LogisticRegressionWithSGD
 
createModel(Vector, double) - Method in class org.apache.spark.mllib.classification.SVMWithSGD
 
createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
Create a model given the weights and intercept
createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.LassoWithSGD
 
createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.LinearRegressionWithSGD
 
createModel(Vector, double) - Method in class org.apache.spark.mllib.regression.RidgeRegressionWithSGD
 
createPollingStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createPollingStream(StreamingContext, Seq<InetSocketAddress>, StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createPollingStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createPollingStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createPollingStream(JavaStreamingContext, InetSocketAddress[], StorageLevel, int, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates an input stream that is to be used with the Spark Sink deployed on a Flume agent.
createRDD(SparkContext, Map<String, String>, OffsetRange[], ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create a RDD from Kafka using offset ranges for each topic and partition.
createRDD(SparkContext, Map<String, String>, OffsetRange[], Map<TopicAndPartition, Broker>, Function1<MessageAndMetadata<K, V>, R>, ClassTag<K>, ClassTag<V>, ClassTag<KD>, ClassTag<VD>, ClassTag<R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create a RDD from Kafka using offset ranges for each topic and partition.
createRDD(JavaSparkContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Map<String, String>, OffsetRange[]) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create a RDD from Kafka using offset ranges for each topic and partition.
createRDD(JavaSparkContext, Class<K>, Class<V>, Class<KD>, Class<VD>, Class<R>, Map<String, String>, OffsetRange[], Map<TopicAndPartition, Broker>, Function<MessageAndMetadata<K, V>, R>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create a RDD from Kafka using offset ranges for each topic and partition.
createRDDFromArray(JavaSparkContext, byte[][]) - Static method in class org.apache.spark.api.r.RRDD
Create an RRDD given a sequence of byte arrays.
createRDDWithLocalProperties(Time, boolean, Function0<U>) - Method in class org.apache.spark.streaming.dstream.DStream
Wrap a body of code such that the call site and operation scope information are passed to the RDDs created in this body properly.
createRelation(SQLContext, Map<String, String>) - Method in class org.apache.spark.ml.source.libsvm.DefaultSource
 
createRelation(SQLContext, SaveMode, Map<String, String>, DataFrame) - Method in interface org.apache.spark.sql.sources.CreatableRelationProvider
Creates a relation with the given parameters based on the contents of the given DataFrame.
createRelation(SQLContext, String[], Option<StructType>, Option<StructType>, Map<String, String>) - Method in interface org.apache.spark.sql.sources.HadoopFsRelationProvider
Returns a new base relation with the given parameters, a user defined schema, and a list of partition columns.
createRelation(SQLContext, Map<String, String>) - Method in interface org.apache.spark.sql.sources.RelationProvider
Returns a new base relation with the given parameters.
createRelation(SQLContext, Map<String, String>, StructType) - Method in interface org.apache.spark.sql.sources.SchemaRelationProvider
Returns a new base relation with the given parameters and user defined schema.
createRWorker(int) - Static method in class org.apache.spark.api.r.RRDD
ProcessBuilder used to launch worker R processes.
createSparkContext(String, String, String, String[], Map<Object, Object>, Map<Object, Object>) - Static method in class org.apache.spark.api.r.RRDD
 
createStream(StreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Create a input stream from a Flume source.
createStream(StreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Create a input stream from a Flume source.
createStream(JavaStreamingContext, String, int) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates a input stream from a Flume source.
createStream(JavaStreamingContext, String, int, StorageLevel) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates a input stream from a Flume source.
createStream(JavaStreamingContext, String, int, StorageLevel, boolean) - Static method in class org.apache.spark.streaming.flume.FlumeUtils
Creates a input stream from a Flume source.
createStream(StreamingContext, String, String, Map<String, Object>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that pulls messages from Kafka Brokers.
createStream(StreamingContext, Map<String, String>, Map<String, Object>, StorageLevel, ClassTag<K>, ClassTag<V>, ClassTag<U>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that pulls messages from Kafka Brokers.
createStream(JavaStreamingContext, String, String, Map<String, Integer>) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that pulls messages from Kafka Brokers.
createStream(JavaStreamingContext, String, String, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that pulls messages from Kafka Brokers.
createStream(JavaStreamingContext, Class<K>, Class<V>, Class<U>, Class<T>, Map<String, String>, Map<String, Integer>, StorageLevel) - Static method in class org.apache.spark.streaming.kafka.KafkaUtils
Create an input stream that pulls messages from Kafka Brokers.
createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function1<Record, T>, String, String, ClassTag<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(StreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(StreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, Function<Record, T>, Class<T>, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(JavaStreamingContext, String, String, String, String, InitialPositionInStream, Duration, StorageLevel, String, String) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(JavaStreamingContext, String, String, Duration, InitialPositionInStream, StorageLevel) - Static method in class org.apache.spark.streaming.kinesis.KinesisUtils
Create an input stream that pulls messages from a Kinesis stream.
createStream(JavaStreamingContext, String, String, String, String, int, Duration, StorageLevel, String, String) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
 
createStream(StreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
Create an input stream that receives messages pushed by a MQTT publisher.
createStream(JavaStreamingContext, String, String) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
Create an input stream that receives messages pushed by a MQTT publisher.
createStream(JavaStreamingContext, String, String, StorageLevel) - Static method in class org.apache.spark.streaming.mqtt.MQTTUtils
Create an input stream that receives messages pushed by a MQTT publisher.
createStream(StreamingContext, Option<Authorization>, Seq<String>, StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter.
createStream(JavaStreamingContext) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and twitter4j.oauth.accessTokenSecret.
createStream(JavaStreamingContext, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and twitter4j.oauth.accessTokenSecret.
createStream(JavaStreamingContext, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter using Twitter4J's default OAuth authentication; this requires the system properties twitter4j.oauth.consumerKey, twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and twitter4j.oauth.accessTokenSecret.
createStream(JavaStreamingContext, Authorization) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter.
createStream(JavaStreamingContext, Authorization, String[]) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter.
createStream(JavaStreamingContext, Authorization, String[], StorageLevel) - Static method in class org.apache.spark.streaming.twitter.TwitterUtils
Create a input stream that returns tweets received from Twitter.
createStream(StreamingContext, String, Subscribe, Function1<Seq<ByteString>, Iterator<T>>, StorageLevel, SupervisorStrategy, ClassTag<T>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
Create an input stream that receives messages pushed by a zeromq publisher.
createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel, SupervisorStrategy) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
Create an input stream that receives messages pushed by a zeromq publisher.
createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>, StorageLevel) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
Create an input stream that receives messages pushed by a zeromq publisher.
createStream(JavaStreamingContext, String, Subscribe, Function<byte[][], Iterable<T>>) - Static method in class org.apache.spark.streaming.zeromq.ZeroMQUtils
Create an input stream that receives messages pushed by a zeromq publisher.
createStructField(String, DataType, boolean, Metadata) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructField by specifying the name (name), data type (dataType) and whether values of this field can be null values (nullable).
createStructField(String, DataType, boolean) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructField with empty metadata.
createStructType(List<StructField>) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructType with the given list of StructFields (fields).
createStructType(StructField[]) - Static method in class org.apache.spark.sql.types.DataTypes
Creates a StructType with the given StructField array (fields).
createTransformFunc() - Method in class org.apache.spark.ml.feature.DCT
 
createTransformFunc() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
 
createTransformFunc() - Method in class org.apache.spark.ml.feature.NGram
 
createTransformFunc() - Method in class org.apache.spark.ml.feature.Normalizer
 
createTransformFunc() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
 
createTransformFunc() - Method in class org.apache.spark.ml.feature.RegexTokenizer
 
createTransformFunc() - Method in class org.apache.spark.ml.feature.Tokenizer
 
createTransformFunc() - Method in class org.apache.spark.ml.UnaryTransformer
Creates the transform function using the given param map.
creationSite() - Method in class org.apache.spark.rdd.RDD
User code that created this RDD (e.g.
creationSite() - Method in class org.apache.spark.streaming.dstream.DStream
 
crosstab(String, String) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Computes a pair-wise frequency table of the given columns.
CrossValidator - Class in org.apache.spark.ml.tuning
:: Experimental :: K-fold cross validation.
CrossValidator(String) - Constructor for class org.apache.spark.ml.tuning.CrossValidator
 
CrossValidator() - Constructor for class org.apache.spark.ml.tuning.CrossValidator
 
CrossValidatorModel - Class in org.apache.spark.ml.tuning
:: Experimental :: Model from k-fold cross validation.
cube(Column...) - Method in class org.apache.spark.sql.DataFrame
Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
cube(String, String...) - Method in class org.apache.spark.sql.DataFrame
Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
cube(Seq<Column>) - Method in class org.apache.spark.sql.DataFrame
Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
cube(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrame
Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
cume_dist() - Static method in class org.apache.spark.sql.functions
Window function: returns the cumulative distribution of values within a window partition, i.e.
cumeDist() - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.6.0, replaced by cume_dist. This will be removed in Spark 2.0.
current_date() - Static method in class org.apache.spark.sql.functions
Returns the current date as a date column.
current_timestamp() - Static method in class org.apache.spark.sql.functions
Returns the current timestamp as a timestamp column.
currentAttemptId() - Method in interface org.apache.spark.SparkStageInfo
 
currentAttemptId() - Method in class org.apache.spark.SparkStageInfoImpl
 
currPrefLocs(Partition) - Method in class org.apache.spark.rdd.PartitionCoalescer
 

D

databaseTypeDefinition() - Method in class org.apache.spark.sql.jdbc.JdbcType
 
dataDistribution() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
 
DataFrame - Class in org.apache.spark.sql
:: Experimental :: A distributed collection of data organized into named columns.
DataFrame(SQLContext, LogicalPlan) - Constructor for class org.apache.spark.sql.DataFrame
A constructor that automatically analyzes the logical plan.
DataFrameHolder - Class in org.apache.spark.sql
A container for a DataFrame, used for implicit conversions.
DataFrameNaFunctions - Class in org.apache.spark.sql
:: Experimental :: Functionality for working with missing data in DataFrames.
DataFrameReader - Class in org.apache.spark.sql
:: Experimental :: Interface used to load a DataFrame from external storage systems (e.g.
DataFrameStatFunctions - Class in org.apache.spark.sql
:: Experimental :: Statistic functions for DataFrames.
DataFrameWriter - Class in org.apache.spark.sql
:: Experimental :: Interface used to write a DataFrame to external storage systems (e.g.
dataSchema() - Method in class org.apache.spark.sql.sources.HadoopFsRelation
Specifies schema of actual data files.
Dataset<T> - Class in org.apache.spark.sql
:: Experimental :: A Dataset is a strongly typed collection of objects that can be transformed in parallel using functional or relational operations.
DatasetHolder<T> - Class in org.apache.spark.sql
A container for a Dataset, used for implicit conversions.
DataSourceRegister - Interface in org.apache.spark.sql.sources
::DeveloperApi:: Data sources should implement this trait so that they can register an alias to their data source.
dataStream() - Method in class org.apache.spark.api.r.BaseRRDD
 
dataType() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
The DataType of the returned value of this UserDefinedAggregateFunction.
DataType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The base type of all Spark SQL data types.
DataType() - Constructor for class org.apache.spark.sql.types.DataType
 
dataType() - Method in class org.apache.spark.sql.types.StructField
 
dataType() - Method in class org.apache.spark.sql.UserDefinedFunction
 
DataTypes - Class in org.apache.spark.sql.types
To get/create specific data type, users should use singleton objects and factory methods provided by this class.
DataTypes() - Constructor for class org.apache.spark.sql.types.DataTypes
 
DataValidators - Class in org.apache.spark.mllib.util
:: DeveloperApi :: A collection of methods used to validate data before applying ML algorithms.
DataValidators() - Constructor for class org.apache.spark.mllib.util.DataValidators
 
date() - Method in class org.apache.spark.sql.ColumnName
Creates a new StructField of type date.
DATE() - Static method in class org.apache.spark.sql.Encoders
An encoder for nullable date type.
date_add(Column, int) - Static method in class org.apache.spark.sql.functions
Returns the date that is days days after start
date_format(Column, String) - Static method in class org.apache.spark.sql.functions
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
date_sub(Column, int) - Static method in class org.apache.spark.sql.functions
Returns the date that is days days before start
datediff(Column, Column) - Static method in class org.apache.spark.sql.functions
Returns the number of days from start to end.
DateType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the DateType object.
DateType - Class in org.apache.spark.sql.types
:: DeveloperApi :: A date type, supporting "0001-01-01" through "9999-12-31".
dayofmonth(Column) - Static method in class org.apache.spark.sql.functions
Extracts the day of the month as an integer from a given date/timestamp/string.
dayofyear(Column) - Static method in class org.apache.spark.sql.functions
Extracts the day of the year as an integer from a given date/timestamp/string.
DB2Dialect - Class in org.apache.spark.sql.jdbc
 
DB2Dialect() - Constructor for class org.apache.spark.sql.jdbc.DB2Dialect
 
DCT - Class in org.apache.spark.ml.feature
:: Experimental :: A feature transformer that takes the 1D discrete cosine transform of a real vector.
DCT(String) - Constructor for class org.apache.spark.ml.feature.DCT
 
DCT() - Constructor for class org.apache.spark.ml.feature.DCT
 
ddlParser() - Method in class org.apache.spark.sql.SQLContext
 
decayFactor() - Method in class org.apache.spark.mllib.clustering.StreamingKMeans
 
decimal() - Method in class org.apache.spark.sql.ColumnName
Creates a new StructField of type decimal.
decimal(int, int) - Method in class org.apache.spark.sql.ColumnName
Creates a new StructField of type decimal.
DECIMAL() - Static method in class org.apache.spark.sql.Encoders
An encoder for nullable decimal type.
Decimal - Class in org.apache.spark.sql.types
A mutable implementation of BigDecimal that can hold a Long if values are small enough.
Decimal() - Constructor for class org.apache.spark.sql.types.Decimal
 
DecimalType - Class in org.apache.spark.sql.types
 
DecimalType(int, int) - Constructor for class org.apache.spark.sql.types.DecimalType
 
DecimalType(int) - Constructor for class org.apache.spark.sql.types.DecimalType
 
DecimalType() - Constructor for class org.apache.spark.sql.types.DecimalType
 
DecimalType(Option<PrecisionInfo>) - Constructor for class org.apache.spark.sql.types.DecimalType
 
DecisionTree - Class in org.apache.spark.mllib.tree
A class which implements a decision tree learning algorithm for classification and regression.
DecisionTree(Strategy) - Constructor for class org.apache.spark.mllib.tree.DecisionTree
 
DecisionTreeClassificationModel - Class in org.apache.spark.ml.classification
:: Experimental :: Decision tree model for classification.
DecisionTreeClassifier - Class in org.apache.spark.ml.classification
:: Experimental :: Decision tree learning algorithm for classification.
DecisionTreeClassifier(String) - Constructor for class org.apache.spark.ml.classification.DecisionTreeClassifier
 
DecisionTreeClassifier() - Constructor for class org.apache.spark.ml.classification.DecisionTreeClassifier
 
DecisionTreeModel - Class in org.apache.spark.mllib.tree.model
Decision tree model for classification or regression.
DecisionTreeModel(Node, Enumeration.Value) - Constructor for class org.apache.spark.mllib.tree.model.DecisionTreeModel
 
DecisionTreeRegressionModel - Class in org.apache.spark.ml.regression
:: Experimental :: Decision tree model for regression.
DecisionTreeRegressor - Class in org.apache.spark.ml.regression
:: Experimental :: Decision tree learning algorithm for regression.
DecisionTreeRegressor(String) - Constructor for class org.apache.spark.ml.regression.DecisionTreeRegressor
 
DecisionTreeRegressor() - Constructor for class org.apache.spark.ml.regression.DecisionTreeRegressor
 
decode(Column, String) - Static method in class org.apache.spark.sql.functions
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
decodeLabel(Vector) - Static method in class org.apache.spark.ml.classification.LabelConverter
Converts a vector to a label.
defaultAttr() - Static method in class org.apache.spark.ml.attribute.BinaryAttribute
The default binary attribute.
defaultAttr() - Static method in class org.apache.spark.ml.attribute.NominalAttribute
The default nominal attribute.
defaultAttr() - Static method in class org.apache.spark.ml.attribute.NumericAttribute
The default numeric attribute.
defaultClassLoader() - Method in class org.apache.spark.serializer.Serializer
Default ClassLoader to use in deserialization.
defaultCopy(ParamMap) - Method in interface org.apache.spark.ml.param.Params
Default implementation of copy with extra params.
defaultMinPartitions() - Method in class org.apache.spark.api.java.JavaSparkContext
Default min number of partitions for Hadoop RDDs when not given by user
defaultMinPartitions() - Method in class org.apache.spark.SparkContext
Default min number of partitions for Hadoop RDDs when not given by user Notice that we use math.min so the "defaultMinPartitions" cannot be higher than 2.
defaultMinSplits() - Method in class org.apache.spark.api.java.JavaSparkContext
Deprecated.
As of Spark 1.0.0, defaultMinSplits is deprecated, use JavaSparkContext.defaultMinPartitions() instead
defaultMinSplits() - Method in class org.apache.spark.SparkContext
Default min number of partitions for Hadoop RDDs when not given by user
defaultParallelism() - Method in class org.apache.spark.api.java.JavaSparkContext
Default level of parallelism to use when not given by user (e.g.
defaultParallelism() - Method in class org.apache.spark.SparkContext
Default level of parallelism to use when not given by user (e.g.
defaultParamMap() - Method in interface org.apache.spark.ml.param.Params
Internal param map for default values.
defaultParams(String) - Static method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
defaultParams(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
defaultPartitioner(RDD<?>, Seq<RDD<?>>) - Static method in class org.apache.spark.Partitioner
Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
defaultSize() - Method in class org.apache.spark.sql.types.ArrayType
The default size of a value of the ArrayType is 100 * the default size of the element type.
defaultSize() - Method in class org.apache.spark.sql.types.BinaryType
The default size of a value of the BinaryType is 4096 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.BooleanType
The default size of a value of the BooleanType is 1 byte.
defaultSize() - Method in class org.apache.spark.sql.types.ByteType
The default size of a value of the ByteType is 1 byte.
defaultSize() - Method in class org.apache.spark.sql.types.CalendarIntervalType
 
defaultSize() - Method in class org.apache.spark.sql.types.DataType
The default size of a value of this data type, used internally for size estimation.
defaultSize() - Method in class org.apache.spark.sql.types.DateType
The default size of a value of the DateType is 4 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.DecimalType
The default size of a value of the DecimalType is 4096 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.DoubleType
The default size of a value of the DoubleType is 8 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.FloatType
The default size of a value of the FloatType is 4 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.IntegerType
The default size of a value of the IntegerType is 4 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.LongType
The default size of a value of the LongType is 8 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.MapType
The default size of a value of the MapType is 100 * (the default size of the key type + the default size of the value type).
defaultSize() - Method in class org.apache.spark.sql.types.NullType
 
defaultSize() - Method in class org.apache.spark.sql.types.ShortType
The default size of a value of the ShortType is 2 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.StringType
The default size of a value of the StringType is 4096 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.StructType
The default size of a value of the StructType is the total default sizes of all field types.
defaultSize() - Method in class org.apache.spark.sql.types.TimestampType
The default size of a value of the TimestampType is 8 bytes.
defaultSize() - Method in class org.apache.spark.sql.types.UserDefinedType
The default size of a value of the UserDefinedType is 4096 bytes.
DefaultSource - Class in org.apache.spark.ml.source.libsvm
libsvm package implements Spark SQL data source API for loading LIBSVM data as DataFrame.
DefaultSource() - Constructor for class org.apache.spark.ml.source.libsvm.DefaultSource
 
defaultStategy(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
 
defaultStrategy(String) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
Construct a default set of parameters for DecisionTree
defaultStrategy(Enumeration.Value) - Static method in class org.apache.spark.mllib.tree.configuration.Strategy
Construct a default set of parameters for DecisionTree
defaultStrategy() - Static method in class org.apache.spark.streaming.receiver.ActorSupervisorStrategy
 
degree() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
The polynomial degree to expand, which should be >= 1.
degrees() - Method in class org.apache.spark.graphx.GraphOps
The degree of each vertex in the graph.
degreesOfFreedom() - Method in class org.apache.spark.mllib.stat.test.ChiSqTestResult
 
degreesOfFreedom() - Method in class org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult
 
degreesOfFreedom() - Method in interface org.apache.spark.mllib.stat.test.TestResult
Returns the degree(s) of freedom of the hypothesis test.
delegate() - Method in class org.apache.spark.InterruptibleIterator
 
dense(int, int, double[]) - Static method in class org.apache.spark.mllib.linalg.Matrices
Creates a column-major dense matrix.
dense(double, double...) - Static method in class org.apache.spark.mllib.linalg.Vectors
Creates a dense vector from its values.
dense(double, Seq<Object>) - Static method in class org.apache.spark.mllib.linalg.Vectors
Creates a dense vector from its values.
dense(double[]) - Static method in class org.apache.spark.mllib.linalg.Vectors
Creates a dense vector from a double array.
dense_rank() - Static method in class org.apache.spark.sql.functions
Window function: returns the rank of rows within a window partition, without any gaps.
DenseMatrix - Class in org.apache.spark.mllib.linalg
Column-major dense matrix.
DenseMatrix(int, int, double[], boolean) - Constructor for class org.apache.spark.mllib.linalg.DenseMatrix
 
DenseMatrix(int, int, double[]) - Constructor for class org.apache.spark.mllib.linalg.DenseMatrix
Column-major dense matrix.
denseRank() - Static method in class org.apache.spark.sql.functions
Deprecated.
As of 1.6.0, replaced by dense_rank. This will be removed in Spark 2.0.
DenseVector - Class in org.apache.spark.mllib.linalg
A dense vector represented by a value array.
DenseVector(double[]) - Constructor for class org.apache.spark.mllib.linalg.DenseVector
 
dependencies() - Method in class org.apache.spark.rdd.RDD
Get the list of dependencies of this RDD, taking into account whether the RDD is checkpointed or not.
dependencies() - Method in class org.apache.spark.streaming.dstream.DStream
List of parent DStreams on which this DStream depends on
dependencies() - Method in class org.apache.spark.streaming.dstream.InputDStream
 
Dependency<T> - Class in org.apache.spark
:: DeveloperApi :: Base class for dependencies.
Dependency() - Constructor for class org.apache.spark.Dependency
 
depth() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
Get depth of tree.
DerbyDialect - Class in org.apache.spark.sql.jdbc
 
DerbyDialect() - Constructor for class org.apache.spark.sql.jdbc.DerbyDialect
 
desc() - Method in class org.apache.spark.sql.Column
Returns an ordering used in sorting.
desc(String) - Static method in class org.apache.spark.sql.functions
Returns a sort expression based on the descending order of the column.
desc() - Method in class org.apache.spark.util.MethodIdentifier
 
describe(String...) - Method in class org.apache.spark.sql.DataFrame
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
describe(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
describeTopics(int) - Method in class org.apache.spark.ml.clustering.LDAModel
Return the topics described by their top-weighted terms.
describeTopics() - Method in class org.apache.spark.ml.clustering.LDAModel
 
describeTopics(int) - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
 
describeTopics(int) - Method in class org.apache.spark.mllib.clustering.LDAModel
Return the topics described by weighted terms.
describeTopics() - Method in class org.apache.spark.mllib.clustering.LDAModel
Return the topics described by weighted terms.
describeTopics(int) - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
 
description() - Method in class org.apache.spark.ExceptionFailure
 
description() - Method in class org.apache.spark.status.api.v1.JobData
 
description() - Method in class org.apache.spark.storage.StorageLevel
 
description() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
 
DeserializationStream - Class in org.apache.spark.serializer
:: DeveloperApi :: A stream for reading serialized objects.
DeserializationStream() - Constructor for class org.apache.spark.serializer.DeserializationStream
 
deserialize(Object) - Method in class org.apache.spark.mllib.linalg.VectorUDT
 
deserialize(ByteBuffer, ClassLoader, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
 
deserialize(ByteBuffer, ClassTag<T>) - Method in class org.apache.spark.serializer.DummySerializerInstance
 
deserialize(ByteBuffer, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
 
deserialize(ByteBuffer, ClassLoader, ClassTag<T>) - Method in class org.apache.spark.serializer.SerializerInstance
 
deserialize(Object) - Method in class org.apache.spark.sql.types.UserDefinedType
Convert a SQL datum to the user type
deserialized() - Method in class org.apache.spark.storage.MemoryEntry
 
deserialized() - Method in class org.apache.spark.storage.StorageLevel
 
deserializeStream(InputStream) - Method in class org.apache.spark.serializer.DummySerializerInstance
 
deserializeStream(InputStream) - Method in class org.apache.spark.serializer.SerializerInstance
 
destroy() - Method in class org.apache.spark.broadcast.Broadcast
Destroy all data and metadata related to this broadcast variable.
details() - Method in class org.apache.spark.scheduler.StageInfo
 
details() - Method in class org.apache.spark.status.api.v1.StageData
 
determineBounds(ArrayBuffer<Tuple2<K, Object>>, int, Ordering<K>, ClassTag<K>) - Static method in class org.apache.spark.RangePartitioner
Determines the bounds for range partitioning from candidates with weights indicating how many items each represents.
deterministic() - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Returns true iff this function is deterministic, i.e.
DeveloperApi - Annotation Type in org.apache.spark.annotation
A lower-level, unstable API intended for developers.
devianceResiduals() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
The weighted residuals, the usual residuals rescaled by the square root of the instance weights.
diag(Vector) - Static method in class org.apache.spark.mllib.linalg.DenseMatrix
Generate a diagonal matrix in DenseMatrix format from the supplied values.
diag(Vector) - Static method in class org.apache.spark.mllib.linalg.Matrices
Generate a diagonal matrix in Matrix format from the supplied values.
dialectClassName() - Method in class org.apache.spark.sql.SQLContext
 
diff(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
diff(VertexRDD<VD>) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
diff(RDD<Tuple2<Object, VD>>) - Method in class org.apache.spark.graphx.VertexRDD
For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other.
diff(VertexRDD<VD>) - Method in class org.apache.spark.graphx.VertexRDD
For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other.
disableOutputSpecValidation() - Static method in class org.apache.spark.rdd.PairRDDFunctions
 
disconnect() - Method in interface org.apache.spark.launcher.SparkAppHandle
Disconnects the handle from the application, without stopping it.
DISK_ONLY - Static variable in class org.apache.spark.api.java.StorageLevels
 
DISK_ONLY() - Static method in class org.apache.spark.storage.StorageLevel
 
DISK_ONLY_2 - Static variable in class org.apache.spark.api.java.StorageLevels
 
DISK_ONLY_2() - Static method in class org.apache.spark.storage.StorageLevel
 
diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
 
diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.StageData
 
diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
 
diskBytesSpilled() - Method in class org.apache.spark.status.api.v1.TaskMetrics
 
diskSize() - Method in class org.apache.spark.storage.BlockStatus
 
diskSize() - Method in class org.apache.spark.storage.BlockUpdatedInfo
 
diskSize() - Method in class org.apache.spark.storage.RDDInfo
 
diskUsed() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
 
diskUsed() - Method in class org.apache.spark.status.api.v1.RDDDataDistribution
 
diskUsed() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
 
diskUsed() - Method in class org.apache.spark.status.api.v1.RDDStorageInfo
 
diskUsed() - Method in class org.apache.spark.storage.StorageStatus
Return the disk space used by this block manager.
diskUsedByRdd(int) - Method in class org.apache.spark.storage.StorageStatus
Return the disk space used by the given RDD in this block manager in O(1) time.
dist(Vector) - Method in class org.apache.spark.util.Vector
 
distinct() - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD containing the distinct elements in this RDD.
distinct(int) - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD containing the distinct elements in this RDD.
distinct() - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD containing the distinct elements in this RDD.
distinct(int) - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD containing the distinct elements in this RDD.
distinct() - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD containing the distinct elements in this RDD.
distinct(int) - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD containing the distinct elements in this RDD.
distinct(int, Ordering<T>) - Method in class org.apache.spark.rdd.RDD
Return a new RDD containing the distinct elements in this RDD.
distinct() - Method in class org.apache.spark.rdd.RDD
Return a new RDD containing the distinct elements in this RDD.
distinct() - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame that contains only the unique rows from this DataFrame.
distinct() - Method in class org.apache.spark.sql.Dataset
Returns a new Dataset that contains only the unique elements of this Dataset.
distinct(Column...) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Creates a Column for this UDAF using the distinct values of the given Columns as input arguments.
distinct(Seq<Column>) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Creates a Column for this UDAF using the distinct values of the given Columns as input arguments.
DistributedLDAModel - Class in org.apache.spark.ml.clustering
:: Experimental ::
DistributedLDAModel - Class in org.apache.spark.mllib.clustering
 
DistributedMatrix - Interface in org.apache.spark.mllib.linalg.distributed
Represents a distributively stored matrix backed by one or more RDDs.
div(Duration) - Method in class org.apache.spark.streaming.Duration
 
divide(Object) - Method in class org.apache.spark.sql.Column
Division this expression by another expression.
divide(double) - Method in class org.apache.spark.util.Vector
 
doc() - Method in class org.apache.spark.ml.param.Param
 
docConcentration() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
 
docConcentration() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
 
docConcentration() - Method in class org.apache.spark.mllib.clustering.LDAModel
Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").
docConcentration() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
 
doDestroy(boolean) - Method in class org.apache.spark.broadcast.Broadcast
Actually destroy all data and metadata related to this broadcast variable.
dot(Vector) - Method in class org.apache.spark.util.Vector
 
DOUBLE() - Static method in class org.apache.spark.sql.Encoders
An encoder for nullable double type.
doubleAccumulator(double) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator double variable, which tasks can "add" values to using the add method.
doubleAccumulator(double, String) - Method in class org.apache.spark.api.java.JavaSparkContext
Create an Accumulator double variable, which tasks can "add" values to using the add method.
DoubleArrayParam - Class in org.apache.spark.ml.param
:: DeveloperApi :: Specialized version of Param[Array[Double} for Java.
DoubleArrayParam(Params, String, String, Function1<double[], Object>) - Constructor for class org.apache.spark.ml.param.DoubleArrayParam
 
DoubleArrayParam(Params, String, String) - Constructor for class org.apache.spark.ml.param.DoubleArrayParam
 
DoubleDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
 
DoubleFlatMapFunction<T> - Interface in org.apache.spark.api.java.function
A function that returns zero or more records of type Double from each input record.
DoubleFunction<T> - Interface in org.apache.spark.api.java.function
A function that returns Doubles, and can be used to construct DoubleRDDs.
DoubleParam - Class in org.apache.spark.ml.param
:: DeveloperApi :: Specialized version of Param[Double] for Java.
DoubleParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.DoubleParam
 
DoubleParam(String, String, String) - Constructor for class org.apache.spark.ml.param.DoubleParam
 
DoubleParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.DoubleParam
 
DoubleParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.DoubleParam
 
DoubleRDDFunctions - Class in org.apache.spark.rdd
Extra functions available on RDDs of Doubles through an implicit conversion.
DoubleRDDFunctions(RDD<Object>) - Constructor for class org.apache.spark.rdd.DoubleRDDFunctions
 
doubleRDDToDoubleRDDFunctions(RDD<Object>) - Static method in class org.apache.spark.rdd.RDD
 
doubleRDDToDoubleRDDFunctions(RDD<Object>) - Static method in class org.apache.spark.SparkContext
 
doubleToDoubleWritable(double) - Static method in class org.apache.spark.SparkContext
 
doubleToMultiplier(double) - Static method in class org.apache.spark.util.Vector
 
DoubleType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the DoubleType object.
DoubleType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The data type representing Double values.
doubleWritableConverter() - Static method in class org.apache.spark.SparkContext
 
doUnpersist(boolean) - Method in class org.apache.spark.broadcast.Broadcast
Actually unpersist the broadcasted value on the executors.
DRIVER_EXTRA_CLASSPATH - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the driver class path.
DRIVER_EXTRA_JAVA_OPTIONS - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the driver VM options.
DRIVER_EXTRA_LIBRARY_PATH - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the driver native library path.
DRIVER_IDENTIFIER() - Static method in class org.apache.spark.SparkContext
Executor id for the driver.
DRIVER_MEMORY - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the driver memory.
driverActorSystemName() - Static method in class org.apache.spark.SparkEnv
 
driverLogs() - Method in class org.apache.spark.scheduler.SparkListenerApplicationStart
 
drop(String) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame with a column dropped.
drop(Column) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame with a column dropped.
drop() - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that drops rows containing any null or NaN values.
drop(String) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that drops rows containing null or NaN values.
drop(String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.
drop(Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
(Scala-specific) Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.
drop(String, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.
drop(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
(Scala-specific) Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.
drop(int) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values.
drop(int, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.
drop(int, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
(Scala-specific) Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.
dropDuplicates() - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame that contains only the unique rows from this DataFrame.
dropDuplicates(Seq<String>) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Returns a new DataFrame with duplicate rows removed, considering only the subset of columns.
dropDuplicates(String[]) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame with duplicate rows removed, considering only the subset of columns.
dropLast() - Method in class org.apache.spark.ml.feature.OneHotEncoder
Whether to drop the last category in the encoded vector (default: true)
dropTempTable(String) - Method in class org.apache.spark.sql.SQLContext
 
Dst - Static variable in class org.apache.spark.graphx.TripletFields
Expose the destination and edge fields but not the source field.
dstAttr() - Method in class org.apache.spark.graphx.EdgeContext
The vertex attribute of the edge's destination vertex.
dstAttr() - Method in class org.apache.spark.graphx.EdgeTriplet
The destination vertex attribute
dstAttr() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
 
dstId() - Method in class org.apache.spark.graphx.Edge
 
dstId() - Method in class org.apache.spark.graphx.EdgeContext
The vertex id of the edge's destination vertex.
dstId() - Method in class org.apache.spark.graphx.impl.AggregatingEdgeContext
 
dstream() - Method in class org.apache.spark.streaming.api.java.JavaDStream
 
dstream() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
 
dstream() - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
 
DStream<T> - Class in org.apache.spark.streaming.dstream
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
DStream(StreamingContext, ClassTag<T>) - Constructor for class org.apache.spark.streaming.dstream.DStream
 
dtypes() - Method in class org.apache.spark.sql.DataFrame
Returns all column names and their data types as an array.
DummySerializerInstance - Class in org.apache.spark.serializer
Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
duration() - Method in class org.apache.spark.scheduler.TaskInfo
 
Duration - Class in org.apache.spark.streaming
 
Duration(long) - Constructor for class org.apache.spark.streaming.Duration
 
duration() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
Return the duration of this output operation.
Durations - Class in org.apache.spark.streaming
 
Durations() - Constructor for class org.apache.spark.streaming.Durations
 

E

Edge<ED> - Class in org.apache.spark.graphx
A single directed edge consisting of a source id, target id, and the data associated with the edge.
Edge(long, long, ED) - Constructor for class org.apache.spark.graphx.Edge
 
EdgeActiveness - Enum in org.apache.spark.graphx.impl
Criteria for filtering edges based on activeness.
EdgeContext<VD,ED,A> - Class in org.apache.spark.graphx
Represents an edge along with its neighboring vertices and allows sending messages along the edge.
EdgeContext() - Constructor for class org.apache.spark.graphx.EdgeContext
 
EdgeDirection - Class in org.apache.spark.graphx
The direction of a directed edge relative to a vertex.
edgeListFile(SparkContext, String, boolean, int, StorageLevel, StorageLevel) - Static method in class org.apache.spark.graphx.GraphLoader
Loads a graph from an edge list formatted file where each line contains two integers: a source id and a target id.
EdgeOnly - Static variable in class org.apache.spark.graphx.TripletFields
Expose only the edge field and not the source or destination field.
EdgeRDD<ED> - Class in org.apache.spark.graphx
EdgeRDD[ED, VD] extends RDD[Edge[ED} by storing the edges in columnar format on each partition for performance.
EdgeRDD(SparkContext, Seq<Dependency<?>>) - Constructor for class org.apache.spark.graphx.EdgeRDD
 
EdgeRDDImpl<ED,VD> - Class in org.apache.spark.graphx.impl
 
edges() - Method in class org.apache.spark.graphx.Graph
An RDD containing the edges and their associated attributes.
edges() - Method in class org.apache.spark.graphx.impl.GraphImpl
 
EdgeTriplet<VD,ED> - Class in org.apache.spark.graphx
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
EdgeTriplet() - Constructor for class org.apache.spark.graphx.EdgeTriplet
 
Either() - Static method in class org.apache.spark.graphx.EdgeDirection
Edges originating from *or* arriving at a vertex of interest.
elements() - Method in class org.apache.spark.util.Vector
 
elementType() - Method in class org.apache.spark.sql.types.ArrayType
 
ElementwiseProduct - Class in org.apache.spark.ml.feature
:: Experimental :: Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector.
ElementwiseProduct(String) - Constructor for class org.apache.spark.ml.feature.ElementwiseProduct
 
ElementwiseProduct() - Constructor for class org.apache.spark.ml.feature.ElementwiseProduct
 
ElementwiseProduct - Class in org.apache.spark.mllib.feature
Outputs the Hadamard product (i.e., the element-wise product) of each input vector with a provided "weight" vector.
ElementwiseProduct(Vector) - Constructor for class org.apache.spark.mllib.feature.ElementwiseProduct
 
EMLDAOptimizer - Class in org.apache.spark.mllib.clustering
:: DeveloperApi ::
EMLDAOptimizer() - Constructor for class org.apache.spark.mllib.clustering.EMLDAOptimizer
 
empty() - Static method in class org.apache.spark.ml.param.ParamMap
Returns an empty param map.
empty() - Static method in class org.apache.spark.sql.types.Metadata
Returns an empty Metadata.
empty() - Static method in class org.apache.spark.storage.BlockStatus
 
emptyDataFrame() - Method in class org.apache.spark.sql.SQLContext
:: Experimental :: Returns a DataFrame with no rows or columns.
emptyNode(int) - Static method in class org.apache.spark.mllib.tree.model.Node
Return a node with the given node id (but nothing else set).
emptyRDD() - Method in class org.apache.spark.api.java.JavaSparkContext
Get an RDD that has no partitions or elements.
emptyRDD(ClassTag<T>) - Method in class org.apache.spark.SparkContext
Get an RDD that has no partitions or elements.
emptyResult() - Method in class org.apache.spark.sql.SQLContext
 
encode(Column, String) - Static method in class org.apache.spark.sql.functions
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
encodeLabeledPoint(LabeledPoint, int) - Static method in class org.apache.spark.ml.classification.LabelConverter
Encodes a label as a vector.
Encoder<T> - Interface in org.apache.spark.sql
:: Experimental :: Used to convert a JVM object of type T to and from the internal Spark SQL representation.
encoder() - Method in class org.apache.spark.sql.TypedColumn
 
Encoders - Class in org.apache.spark.sql
:: Experimental :: Methods for creating an Encoder.
Encoders() - Constructor for class org.apache.spark.sql.Encoders
 
endsWith(Column) - Method in class org.apache.spark.sql.Column
String ends with.
endsWith(String) - Method in class org.apache.spark.sql.Column
String ends with another string literal.
endTime() - Method in class org.apache.spark.status.api.v1.ApplicationAttemptInfo
 
endTime() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
 
endTime() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
entries() - Method in class org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
 
Entropy - Class in org.apache.spark.mllib.tree.impurity
:: Experimental :: Class for calculating entropy during binary classification.
Entropy() - Constructor for class org.apache.spark.mllib.tree.impurity.Entropy
 
EnumUtil - Class in org.apache.spark.util
 
EnumUtil() - Constructor for class org.apache.spark.util.EnumUtil
 
env() - Method in class org.apache.spark.api.java.JavaSparkContext
 
env() - Method in class org.apache.spark.streaming.StreamingContext
 
environmentDetails() - Method in class org.apache.spark.scheduler.SparkListenerEnvironmentUpdate
 
EnvironmentListener - Class in org.apache.spark.ui.env
:: DeveloperApi :: A SparkListener that prepares information to be displayed on the EnvironmentTab
EnvironmentListener() - Constructor for class org.apache.spark.ui.env.EnvironmentListener
 
EPSILON() - Static method in class org.apache.spark.mllib.util.MLUtils
 
eqNullSafe(Object) - Method in class org.apache.spark.sql.Column
Equality test that is safe for null values.
EqualNullSafe - Class in org.apache.spark.sql.sources
Performs equality comparison, similar to EqualTo.
EqualNullSafe(String, Object) - Constructor for class org.apache.spark.sql.sources.EqualNullSafe
 
equals(Object) - Method in class org.apache.spark.graphx.EdgeDirection
 
equals(Object) - Method in class org.apache.spark.HashPartitioner
 
equals(Object) - Method in class org.apache.spark.ml.attribute.AttributeGroup
 
equals(Object) - Method in class org.apache.spark.ml.attribute.BinaryAttribute
 
equals(Object) - Method in class org.apache.spark.ml.attribute.NominalAttribute
 
equals(Object) - Method in class org.apache.spark.ml.attribute.NumericAttribute
 
equals(Object) - Method in class org.apache.spark.ml.param.Param
 
equals(Object) - Method in class org.apache.spark.ml.tree.CategoricalSplit
 
equals(Object) - Method in class org.apache.spark.ml.tree.ContinuousSplit
 
equals(Object) - Method in class org.apache.spark.mllib.linalg.DenseMatrix
 
equals(Object) - Method in class org.apache.spark.mllib.linalg.SparseMatrix
 
equals(Object) - Method in interface org.apache.spark.mllib.linalg.Vector
 
equals(Object) - Method in class org.apache.spark.mllib.linalg.VectorUDT
 
equals(Object) - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
 
equals(Object) - Method in class org.apache.spark.mllib.tree.model.Predict
 
equals(Object) - Method in class org.apache.spark.RangePartitioner
 
equals(Object) - Method in class org.apache.spark.scheduler.AccumulableInfo
 
equals(Object) - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
 
equals(Object) - Method in class org.apache.spark.scheduler.InputFormatInfo
 
equals(Object) - Method in class org.apache.spark.scheduler.SplitInfo
 
equals(Object) - Method in class org.apache.spark.sql.Column
 
equals(Object) - Method in interface org.apache.spark.sql.Row
 
equals(Object) - Method in class org.apache.spark.sql.types.Decimal
 
equals(Object) - Method in class org.apache.spark.sql.types.Metadata
 
equals(Object) - Method in class org.apache.spark.sql.types.UserDefinedType
 
equals(Object) - Method in class org.apache.spark.storage.BlockId
 
equals(Object) - Method in class org.apache.spark.storage.BlockManagerId
 
equals(Object) - Method in class org.apache.spark.storage.StorageLevel
 
equals(Object) - Method in class org.apache.spark.streaming.kafka.Broker
Broker's port
equals(Object) - Method in class org.apache.spark.streaming.kafka.OffsetRange
 
equalTo(Object) - Method in class org.apache.spark.sql.Column
Equality test.
EqualTo - Class in org.apache.spark.sql.sources
A filter that evaluates to true iff the attribute evaluates to a value equal to value.
EqualTo(String, Object) - Constructor for class org.apache.spark.sql.sources.EqualTo
 
errorMessage() - Method in class org.apache.spark.status.api.v1.TaskData
 
estimate(double[]) - Method in class org.apache.spark.mllib.stat.KernelDensity
Estimates probability density function at the given array of points.
estimate(Object) - Static method in class org.apache.spark.util.SizeEstimator
Estimate the number of bytes that the given object takes up on the JVM heap.
estimatedDocConcentration() - Method in class org.apache.spark.ml.clustering.LDAModel
Value for docConcentration estimated from data.
Estimator<M extends Model<M>> - Class in org.apache.spark.ml
:: DeveloperApi :: Abstract class for estimators that fit models to data.
Estimator() - Constructor for class org.apache.spark.ml.Estimator
 
evaluate(DataFrame) - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
 
evaluate(DataFrame, ParamMap) - Method in class org.apache.spark.ml.evaluation.Evaluator
Evaluates model output and returns a scalar metric (larger is better).
evaluate(DataFrame) - Method in class org.apache.spark.ml.evaluation.Evaluator
Evaluates the output.
evaluate(DataFrame) - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
 
evaluate(DataFrame) - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
 
evaluate(Row) - Method in class org.apache.spark.sql.expressions.UserDefinedAggregateFunction
Calculates the final result of this UserDefinedAggregateFunction based on the given aggregation buffer.
evaluateEachIteration(RDD<LabeledPoint>, Loss) - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
Method to compute error or loss for every iteration of gradient boosting.
Evaluator - Class in org.apache.spark.ml.evaluation
:: DeveloperApi :: Abstract class for evaluators that compute metrics from predictions.
Evaluator() - Constructor for class org.apache.spark.ml.evaluation.Evaluator
 
event() - Method in class org.apache.spark.streaming.flume.SparkFlumeEvent
 
except(DataFrame) - Method in class org.apache.spark.sql.DataFrame
Returns a new DataFrame containing rows in this frame but not in another frame.
exception() - Method in class org.apache.spark.ExceptionFailure
 
exception() - Method in class org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread
Contains the exception thrown while writing the parent iterator to the external process.
ExceptionFailure - Class in org.apache.spark
:: DeveloperApi :: Task failed due to a runtime exception.
ExceptionFailure(String, String, StackTraceElement[], String, Option<TaskMetrics>, Option<ThrowableSerializationWrapper>) - Constructor for class org.apache.spark.ExceptionFailure
 
execId() - Method in class org.apache.spark.ExecutorLostFailure
 
execId() - Method in class org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate
 
executePlan(LogicalPlan) - Method in class org.apache.spark.sql.hive.HiveContext
 
executePlan(LogicalPlan) - Method in class org.apache.spark.sql.SQLContext
 
executeSql(String) - Method in class org.apache.spark.sql.SQLContext
 
executionHive() - Method in class org.apache.spark.sql.hive.HiveContext
The copy of the hive client that is used for execution.
ExecutionListenerManager - Class in org.apache.spark.sql.util
:: Experimental ::
EXECUTOR_CORES - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the number of executor CPU cores.
EXECUTOR_EXTRA_CLASSPATH - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the executor class path.
EXECUTOR_EXTRA_JAVA_OPTIONS - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the executor VM options.
EXECUTOR_EXTRA_LIBRARY_PATH - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the executor native library path.
EXECUTOR_MEMORY - Static variable in class org.apache.spark.launcher.SparkLauncher
Configuration key for the executor memory.
executorActorSystemName() - Static method in class org.apache.spark.SparkEnv
 
executorDeserializeTime() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
 
executorDeserializeTime() - Method in class org.apache.spark.status.api.v1.TaskMetrics
 
executorEnvs() - Method in class org.apache.spark.SparkContext
 
executorHost() - Method in class org.apache.spark.scheduler.cluster.ExecutorInfo
 
executorId() - Method in class org.apache.spark.ExecutorRegistered
 
executorId() - Method in class org.apache.spark.ExecutorRemoved
 
executorId() - Method in class org.apache.spark.scheduler.SparkListenerExecutorAdded
 
executorId() - Method in class org.apache.spark.scheduler.SparkListenerExecutorRemoved
 
executorId() - Method in class org.apache.spark.scheduler.TaskInfo
 
executorId() - Method in class org.apache.spark.SparkEnv
 
executorId() - Method in class org.apache.spark.status.api.v1.TaskData
 
executorId() - Method in class org.apache.spark.storage.BlockManagerId
 
executorId() - Method in class org.apache.spark.streaming.scheduler.ReceiverInfo
 
executorIdToBlockManagerId() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
executorIdToData() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorIdToStorageStatus() - Method in class org.apache.spark.storage.StorageStatusListener
 
ExecutorInfo - Class in org.apache.spark.scheduler.cluster
:: DeveloperApi :: Stores information about an executor to pass from the scheduler to SparkListeners.
ExecutorInfo(String, int, Map<String, String>) - Constructor for class org.apache.spark.scheduler.cluster.ExecutorInfo
 
executorInfo() - Method in class org.apache.spark.scheduler.SparkListenerExecutorAdded
 
executorLogs() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
 
ExecutorLostFailure - Class in org.apache.spark
:: DeveloperApi :: The task failed because the executor that it was running on was lost.
ExecutorLostFailure(String, boolean, Option<String>) - Constructor for class org.apache.spark.ExecutorLostFailure
 
executorPct() - Method in class org.apache.spark.scheduler.RuntimePercentage
 
ExecutorRegistered - Class in org.apache.spark
 
ExecutorRegistered(String) - Constructor for class org.apache.spark.ExecutorRegistered
 
ExecutorRemoved - Class in org.apache.spark
 
ExecutorRemoved(String) - Constructor for class org.apache.spark.ExecutorRemoved
 
executorRunTime() - Method in class org.apache.spark.status.api.v1.StageData
 
executorRunTime() - Method in class org.apache.spark.status.api.v1.TaskMetricDistributions
 
executorRunTime() - Method in class org.apache.spark.status.api.v1.TaskMetrics
 
executors() - Method in class org.apache.spark.status.api.v1.RDDPartitionInfo
 
ExecutorsListener - Class in org.apache.spark.ui.exec
:: DeveloperApi :: A SparkListener that prepares information to be displayed on the ExecutorsTab
ExecutorsListener(StorageStatusListener) - Constructor for class org.apache.spark.ui.exec.ExecutorsListener
 
ExecutorStageSummary - Class in org.apache.spark.status.api.v1
 
ExecutorSummary - Class in org.apache.spark.status.api.v1
 
executorSummary() - Method in class org.apache.spark.status.api.v1.StageData
 
executorToDuration() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToInputBytes() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToInputRecords() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToLogUrls() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToOutputBytes() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToOutputRecords() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToShuffleRead() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToShuffleWrite() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToTasksActive() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToTasksComplete() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
executorToTasksFailed() - Method in class org.apache.spark.ui.exec.ExecutorsListener
 
exists() - Method in class org.apache.spark.streaming.State
Whether the state already exists
exitCausedByApp() - Method in class org.apache.spark.ExecutorLostFailure
 
exp(Column) - Static method in class org.apache.spark.sql.functions
Computes the exponential of the given value.
exp(String) - Static method in class org.apache.spark.sql.functions
Computes the exponential of the given column.
ExpectationSum - Class in org.apache.spark.mllib.clustering
 
ExpectationSum(double, double[], DenseVector<Object>[], DenseMatrix<Object>[]) - Constructor for class org.apache.spark.mllib.clustering.ExpectationSum
 
Experimental - Annotation Type in org.apache.spark.annotation
An experimental user-facing API.
experimental() - Method in class org.apache.spark.sql.SQLContext
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
ExperimentalMethods - Class in org.apache.spark.sql
:: Experimental :: Holder for experimental methods for the bravest.
ExperimentalMethods(SQLContext) - Constructor for class org.apache.spark.sql.ExperimentalMethods
 
explain(boolean) - Method in class org.apache.spark.sql.Column
Prints the expression to the console for debugging purpose.
explain(boolean) - Method in class org.apache.spark.sql.DataFrame
Prints the plans (logical and physical) to the console for debugging purposes.
explain() - Method in class org.apache.spark.sql.DataFrame
Prints the physical plan to the console for debugging purposes.
explain(boolean) - Method in class org.apache.spark.sql.Dataset
Prints the plans (logical and physical) to the console for debugging purposes.
explain() - Method in class org.apache.spark.sql.Dataset
Prints the physical plan to the console for debugging purposes.
explainedVariance() - Method in class org.apache.spark.ml.regression.LinearRegressionSummary
 
explainedVariance() - Method in class org.apache.spark.mllib.evaluation.RegressionMetrics
Returns the variance explained by regression.
explainParam(Param<?>) - Method in interface org.apache.spark.ml.param.Params
 
explainParams() - Method in interface org.apache.spark.ml.param.Params
 
explode(Seq<Column>, Function1<Row, TraversableOnce<A>>, TypeTags.TypeTag<A>) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Returns a new DataFrame where each row has been expanded to zero or more rows by the provided function.
explode(String, String, Function1<A, TraversableOnce<B>>, TypeTags.TypeTag<B>) - Method in class org.apache.spark.sql.DataFrame
(Scala-specific) Returns a new DataFrame where a single column has been expanded to zero or more rows by the provided function.
explode(Column) - Static method in class org.apache.spark.sql.functions
Creates a new row for each element in the given array or map column.
expm1(Column) - Static method in class org.apache.spark.sql.functions
Computes the exponential of the given value minus one.
expm1(String) - Static method in class org.apache.spark.sql.functions
Computes the exponential of the given column.
ExponentialGenerator - Class in org.apache.spark.mllib.random
:: DeveloperApi :: Generates i.i.d.
ExponentialGenerator(double) - Constructor for class org.apache.spark.mllib.random.ExponentialGenerator
 
exponentialJavaRDD(JavaSparkContext, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
exponentialJavaRDD(JavaSparkContext, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
exponentialJavaRDD(JavaSparkContext, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
exponentialJavaVectorRDD(JavaSparkContext, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
exponentialJavaVectorRDD(JavaSparkContext, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
exponentialJavaVectorRDD(JavaSparkContext, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
exponentialRDD(SparkContext, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
Generates an RDD comprised of i.i.d. samples from the exponential distribution with the input mean.
exponentialVectorRDD(SparkContext, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
Generates an RDD[Vector] with vectors containing i.i.d. samples drawn from the exponential distribution with the input mean.
expr() - Method in class org.apache.spark.sql.Column
 
expr(String) - Static method in class org.apache.spark.sql.functions
Parses the expression string into the column that it represents, similar to DataFrame.selectExpr
externalBlockStoreFolderName() - Method in class org.apache.spark.SparkContext
 
externalBlockStoreSize() - Method in class org.apache.spark.storage.BlockStatus
 
externalBlockStoreSize() - Method in class org.apache.spark.storage.BlockUpdatedInfo
 
externalBlockStoreSize() - Method in class org.apache.spark.storage.RDDInfo
 
extractAFTPoints(DataFrame) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegression
Extract featuresCol, labelCol and censorCol from input dataset, and put it in an RDD with strong types.
extractDistribution(Function1<BatchInfo, Option<Object>>) - Method in class org.apache.spark.streaming.scheduler.StatsReportListener
 
extractDoubleDistribution(Seq<Tuple2<TaskInfo, TaskMetrics>>, Function2<TaskInfo, TaskMetrics, Option<Object>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
 
extractLabeledPoints(DataFrame) - Method in class org.apache.spark.ml.Predictor
Extract labelCol and featuresCol from the given dataset, and put it in an RDD with strong types.
extractLongDistribution(Seq<Tuple2<TaskInfo, TaskMetrics>>, Function2<TaskInfo, TaskMetrics, Option<Object>>) - Static method in class org.apache.spark.scheduler.StatsReportListener
 
extractParamMap(ParamMap) - Method in interface org.apache.spark.ml.param.Params
Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
extractParamMap() - Method in interface org.apache.spark.ml.param.Params
extractParamMap with no extra values.
extraStrategies() - Method in class org.apache.spark.sql.ExperimentalMethods
Allows extra strategies to be injected into the query planner at runtime.
eye(int) - Static method in class org.apache.spark.mllib.linalg.DenseMatrix
Generate an Identity Matrix in DenseMatrix format.
eye(int) - Static method in class org.apache.spark.mllib.linalg.Matrices
Generate a dense Identity Matrix in Matrix format.

F

f() - Method in class org.apache.spark.sql.UserDefinedFunction
 
f1Measure() - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
Returns document-based f1-measure averaged by the number of documents
f1Measure(double) - Method in class org.apache.spark.mllib.evaluation.MultilabelMetrics
Returns f1-measure for a given label (category)
factorial(Column) - Static method in class org.apache.spark.sql.functions
Computes the factorial of the given value.
failed() - Method in class org.apache.spark.scheduler.TaskInfo
 
failedJobs() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
failedStages() - Method in class org.apache.spark.ui.jobs.JobProgressListener
 
failedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorStageSummary
 
failedTasks() - Method in class org.apache.spark.status.api.v1.ExecutorSummary
 
failureReason() - Method in class org.apache.spark.scheduler.StageInfo
If the stage failed, the reason why.
failureReason() - Method in class org.apache.spark.streaming.scheduler.OutputOperationInfo
 
FAIR() - Static method in class org.apache.spark.scheduler.SchedulingMode
 
falsePositiveRate(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns false positive rate for a given label (category)
feature() - Method in class org.apache.spark.mllib.tree.model.Split
 
featureImportances() - Method in class org.apache.spark.ml.classification.RandomForestClassificationModel
Estimate of the importance of each feature.
featureImportances() - Method in class org.apache.spark.ml.regression.RandomForestRegressionModel
Estimate of the importance of each feature.
featureIndex() - Method in class org.apache.spark.ml.tree.CategoricalSplit
 
featureIndex() - Method in class org.apache.spark.ml.tree.ContinuousSplit
 
featureIndex() - Method in interface org.apache.spark.ml.tree.Split
Index of feature which this split tests
features() - Method in class org.apache.spark.mllib.regression.LabeledPoint
 
featuresCol() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
 
featuresCol() - Method in interface org.apache.spark.ml.classification.LogisticRegressionSummary
Field in "predictions" which gives the features of each instance as a vector.
featuresCol() - Method in class org.apache.spark.ml.regression.LinearRegressionTrainingSummary
 
featuresDataType() - Method in class org.apache.spark.ml.PredictionModel
Returns the SQL DataType corresponding to the FeaturesType type parameter.
FeatureType - Class in org.apache.spark.mllib.tree.configuration
Enum to describe whether a feature is "continuous" or "categorical"
FeatureType() - Constructor for class org.apache.spark.mllib.tree.configuration.FeatureType
 
featureType() - Method in class org.apache.spark.mllib.tree.model.Split
 
FetchFailed - Class in org.apache.spark
:: DeveloperApi :: Task failed to fetch shuffle data from a remote node.
FetchFailed(BlockManagerId, int, int, int, String) - Constructor for class org.apache.spark.FetchFailed
 
fetchPct() - Method in class org.apache.spark.scheduler.RuntimePercentage
 
fetchWaitTime() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetricDistributions
 
fetchWaitTime() - Method in class org.apache.spark.status.api.v1.ShuffleReadMetrics
 
field() - Method in class org.apache.spark.storage.BroadcastBlockId
 
fieldIndex(String) - Method in interface org.apache.spark.sql.Row
Returns the index of a given field name.
fieldIndex(String) - Method in class org.apache.spark.sql.types.StructType
Returns index of a given field
fieldNames() - Method in class org.apache.spark.sql.types.StructType
Returns all field names in an array.
fields() - Method in class org.apache.spark.sql.types.StructType
 
FIFO() - Static method in class org.apache.spark.scheduler.SchedulingMode
 
files() - Method in class org.apache.spark.SparkContext
 
fileStream(String, Class<K>, Class<V>, Class<F>) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path, Boolean>, boolean) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path, Boolean>, boolean, Configuration) - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
fileStream(String, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
fileStream(String, Function1<Path, Object>, boolean, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
fileStream(String, Function1<Path, Object>, boolean, Configuration, ClassTag<K>, ClassTag<V>, ClassTag<F>) - Method in class org.apache.spark.streaming.StreamingContext
Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
fill(double) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
fill(String) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that replaces null values in string columns with value.
fill(double, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that replaces null or NaN values in specified numeric columns.
fill(double, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
(Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns.
fill(String, String[]) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that replaces null values in specified string columns.
fill(String, Seq<String>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
(Scala-specific) Returns a new DataFrame that replaces null values in specified string columns.
fill(Map<String, Object>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
Returns a new DataFrame that replaces null values.
fill(Map<String, Object>) - Method in class org.apache.spark.sql.DataFrameNaFunctions
(Scala-specific) Returns a new DataFrame that replaces null values.
filter(Function<Double, Boolean>) - Method in class org.apache.spark.api.java.JavaDoubleRDD
Return a new RDD containing only the elements that satisfy a predicate.
filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.api.java.JavaPairRDD
Return a new RDD containing only the elements that satisfy a predicate.
filter(Function<T, Boolean>) - Method in class org.apache.spark.api.java.JavaRDD
Return a new RDD containing only the elements that satisfy a predicate.
filter(Function1<Graph<VD, ED>, Graph<VD2, ED2>>, Function1<EdgeTriplet<VD2, ED2>, Object>, Function2<Object, VD2, Object>, ClassTag<VD2>, ClassTag<ED2>) - Method in class org.apache.spark.graphx.GraphOps
Filter the graph by computing some values to filter on, and applying the predicates.
filter(Function1<EdgeTriplet<VD, ED>, Object>, Function2<Object, VD, Object>) - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
 
filter(Function1<Tuple2<Object, VD>, Object>) - Method in class org.apache.spark.graphx.VertexRDD
Restricts the vertex set to the set of vertices satisfying the given predicate.
filter(Params) - Method in class org.apache.spark.ml.param.ParamMap
Filters this param map for the given parent.
filter(Function1<T, Object>) - Method in class org.apache.spark.rdd.RDD
Return a new RDD containing only the elements that satisfy a predicate.
filter(Column) - Method in class org.apache.spark.sql.DataFrame
Filters rows using the given condition.
filter(String) - Method in class org.apache.spark.sql.DataFrame
Filters rows using the given SQL expression.
filter(Function1<T, Object>) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Returns a new Dataset that only contains elements where func returns true.
filter(FilterFunction<T>) - Method in class org.apache.spark.sql.Dataset
(Java-specific) Returns a new Dataset that only contains elements where func returns true.
Filter - Class in org.apache.spark.sql.sources
A filter predicate for data sources.
Filter() - Constructor for class org.apache.spark.sql.sources.Filter
 
filter(Function<T, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaDStream
Return a new DStream containing only the elements that satisfy a predicate.
filter(Function<Tuple2<K, V>, Boolean>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream containing only the elements that satisfy a predicate.
filter(Function1<T, Object>) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream containing only the elements that satisfy a predicate.
filterByRange(K, K) - Method in class org.apache.spark.rdd.OrderedRDDFunctions
Returns an RDD containing only the elements in the the inclusive range lower to upper.
FilterFunction<T> - Interface in org.apache.spark.api.java.function
Base interface for a function used in Dataset's filter function.
filterWith(Function1<Object, A>, Function2<T, A, Object>) - Method in class org.apache.spark.rdd.RDD
Filters this RDD with p, where p takes an additional parameter of type A.
findSplitsBins(RDD<LabeledPoint>, org.apache.spark.mllib.tree.impl.DecisionTreeMetadata) - Static method in class org.apache.spark.mllib.tree.DecisionTree
Returns splits and bins for decision tree calculation.
findSynonyms(String, int) - Method in class org.apache.spark.ml.feature.Word2VecModel
Find "num" number of words closest in similarity to the given word.
findSynonyms(Vector, int) - Method in class org.apache.spark.ml.feature.Word2VecModel
Find "num" number of words closest to similarity to the given vector representation of the word.
findSynonyms(String, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
 
findSynonyms(Vector, int) - Method in class org.apache.spark.mllib.feature.Word2VecModel
 
finish(B) - Method in class org.apache.spark.sql.expressions.Aggregator
Transform the output of the reduction.
finished() - Method in class org.apache.spark.scheduler.TaskInfo
 
finishTime() - Method in class org.apache.spark.scheduler.TaskInfo
The time when the task has completed successfully (including the time to remotely fetch results, if necessary).
first() - Method in class org.apache.spark.api.java.JavaDoubleRDD
 
first() - Method in class org.apache.spark.api.java.JavaPairRDD
 
first() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the first element in this RDD.
first() - Method in class org.apache.spark.rdd.RDD
Return the first element in this RDD.
first() - Method in class org.apache.spark.sql.DataFrame
Returns the first row.
first() - Method in class org.apache.spark.sql.Dataset
Returns the first element in this Dataset.
first(Column) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the first value in a group.
first(String) - Static method in class org.apache.spark.sql.functions
Aggregate function: returns the first value of a column in a group.
firstParent(ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Returns the first parent RDD
fit(DataFrame) - Method in class org.apache.spark.ml.classification.OneVsRest
 
fit(DataFrame) - Method in class org.apache.spark.ml.clustering.KMeans
 
fit(DataFrame) - Method in class org.apache.spark.ml.clustering.LDA
 
fit(DataFrame, ParamPair<?>, ParamPair<?>...) - Method in class org.apache.spark.ml.Estimator
Fits a single model to the input data with optional parameters.
fit(DataFrame, ParamPair<?>, Seq<ParamPair<?>>) - Method in class org.apache.spark.ml.Estimator
Fits a single model to the input data with optional parameters.
fit(DataFrame, ParamMap) - Method in class org.apache.spark.ml.Estimator
Fits a single model to the input data with provided parameter map.
fit(DataFrame) - Method in class org.apache.spark.ml.Estimator
Fits a model to the input data.
fit(DataFrame, ParamMap[]) - Method in class org.apache.spark.ml.Estimator
Fits multiple models to the input data with multiple sets of parameters.
fit(DataFrame) - Method in class org.apache.spark.ml.feature.ChiSqSelector
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.CountVectorizer
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.IDF
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.MinMaxScaler
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.PCA
Computes a PCAModel that contains the principal components of the input vectors.
fit(DataFrame) - Method in class org.apache.spark.ml.feature.QuantileDiscretizer
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.RFormula
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.StandardScaler
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.StringIndexer
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.VectorIndexer
 
fit(DataFrame) - Method in class org.apache.spark.ml.feature.Word2Vec
 
fit(DataFrame) - Method in class org.apache.spark.ml.Pipeline
Fits the pipeline to the input dataset with additional parameters.
fit(DataFrame) - Method in class org.apache.spark.ml.Predictor
 
fit(DataFrame) - Method in class org.apache.spark.ml.recommendation.ALS
 
fit(DataFrame) - Method in class org.apache.spark.ml.regression.AFTSurvivalRegression
 
fit(DataFrame) - Method in class org.apache.spark.ml.regression.IsotonicRegression
 
fit(DataFrame) - Method in class org.apache.spark.ml.tuning.CrossValidator
 
fit(DataFrame) - Method in class org.apache.spark.ml.tuning.TrainValidationSplit
 
fit(RDD<LabeledPoint>) - Method in class org.apache.spark.mllib.feature.ChiSqSelector
 
fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
Computes the inverse document frequency.
fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.IDF
Computes the inverse document frequency.
fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.PCA
Computes a PCAModel that contains the principal components of the input vectors.
fit(JavaRDD<Vector>) - Method in class org.apache.spark.mllib.feature.PCA
Java-friendly version of fit()
fit(RDD<Vector>) - Method in class org.apache.spark.mllib.feature.StandardScaler
Computes the mean and variance and stores as a model to be used for later scaling.
fit(RDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
 
fit(JavaRDD<S>) - Method in class org.apache.spark.mllib.feature.Word2Vec
 
flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
flatMap(Function1<T, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
flatMap(Function1<Row, TraversableOnce<R>>, ClassTag<R>) - Method in class org.apache.spark.sql.DataFrame
Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.
flatMap(Function1<T, TraversableOnce<U>>, Encoder<U>) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Returns a new Dataset by first applying a function to all elements of this Dataset, and then flattening the results.
flatMap(FlatMapFunction<T, U>, Encoder<U>) - Method in class org.apache.spark.sql.Dataset
(Java-specific) Returns a new Dataset by first applying a function to all elements of this Dataset, and then flattening the results.
flatMap(FlatMapFunction<T, U>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream by applying a function to all elements of this DStream, and then flattening the results
flatMap(Function1<T, Traversable<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream by applying a function to all elements of this DStream, and then flattening the results
FlatMapFunction<T,R> - Interface in org.apache.spark.api.java.function
A function that returns zero or more output records from each input record.
FlatMapFunction2<T1,T2,R> - Interface in org.apache.spark.api.java.function
A function that takes two inputs and returns zero or more output records.
flatMapGroups(Function2<K, Iterator<V>, TraversableOnce<U>>, Encoder<U>) - Method in class org.apache.spark.sql.GroupedDataset
Applies the given function to each group of data.
flatMapGroups(FlatMapGroupsFunction<K, V, U>, Encoder<U>) - Method in class org.apache.spark.sql.GroupedDataset
Applies the given function to each group of data.
FlatMapGroupsFunction<K,V,R> - Interface in org.apache.spark.api.java.function
A function that returns zero or more output records from each grouping key and its values.
flatMapToDouble(DoubleFlatMapFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
flatMapToPair(PairFlatMapFunction<T, K2, V2>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream by applying a function to all elements of this DStream, and then flattening the results
flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.api.java.JavaPairRDD
Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD's partitioning.
flatMapValues(Function1<V, TraversableOnce<U>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD's partitioning.
flatMapValues(Function<V, Iterable<U>>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying a flatmap function to the value of each key-value pairs in 'this' DStream without changing the key.
flatMapValues(Function1<V, TraversableOnce<U>>, ClassTag<U>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying a flatmap function to the value of each key-value pairs in 'this' DStream without changing the key.
flatMapWith(Function1<Object, A>, boolean, Function2<T, A, Seq<U>>, ClassTag<U>) - Method in class org.apache.spark.rdd.RDD
FlatMaps f over this RDD, where f takes an additional parameter of type A.
FLOAT() - Static method in class org.apache.spark.sql.Encoders
An encoder for nullable float type.
FloatDecimal() - Static method in class org.apache.spark.sql.types.DecimalType
 
FloatParam - Class in org.apache.spark.ml.param
:: DeveloperApi :: Specialized version of Param[Float] for Java.
FloatParam(String, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.FloatParam
 
FloatParam(String, String, String) - Constructor for class org.apache.spark.ml.param.FloatParam
 
FloatParam(Identifiable, String, String, Function1<Object, Object>) - Constructor for class org.apache.spark.ml.param.FloatParam
 
FloatParam(Identifiable, String, String) - Constructor for class org.apache.spark.ml.param.FloatParam
 
floatToFloatWritable(float) - Static method in class org.apache.spark.SparkContext
 
FloatType - Static variable in class org.apache.spark.sql.types.DataTypes
Gets the FloatType object.
FloatType - Class in org.apache.spark.sql.types
:: DeveloperApi :: The data type representing Float values.
floatWritableConverter() - Static method in class org.apache.spark.SparkContext
 
floor(Column) - Static method in class org.apache.spark.sql.functions
Computes the floor of the given value.
floor(String) - Static method in class org.apache.spark.sql.functions
Computes the floor of the given column.
floor() - Method in class org.apache.spark.sql.types.Decimal
 
floor(Duration) - Method in class org.apache.spark.streaming.Time
 
floor(Duration, Time) - Method in class org.apache.spark.streaming.Time
 
FlumeUtils - Class in org.apache.spark.streaming.flume
 
FlumeUtils() - Constructor for class org.apache.spark.streaming.flume.FlumeUtils
 
flush() - Method in class org.apache.spark.io.SnappyOutputStreamWrapper
 
flush() - Method in class org.apache.spark.serializer.SerializationStream
 
flush() - Method in class org.apache.spark.storage.TimeTrackingOutputStream
 
fMeasure(double, double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns f-measure for a given label (category)
fMeasure(double) - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns f1-measure for a given label (category)
fMeasure() - Method in class org.apache.spark.mllib.evaluation.MulticlassMetrics
Returns f-measure (equals to precision and recall because precision equals recall)
fMeasureByThreshold() - Method in class org.apache.spark.ml.classification.BinaryLogisticRegressionSummary
Returns a dataframe with two fields (threshold, F-Measure) curve with beta = 1.0.
fMeasureByThreshold(double) - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Returns the (threshold, F-Measure) curve.
fMeasureByThreshold() - Method in class org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
Returns the (threshold, F-Measure) curve with beta = 1.0.
fold(T, Function2<T, T, T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Aggregate the elements of each partition, and then the results for all the partitions, using a given associative and commutative function and a neutral "zero value".
fold(T, Function2<T, T, T>) - Method in class org.apache.spark.rdd.RDD
Aggregate the elements of each partition, and then the results for all the partitions, using a given associative and commutative function and a neutral "zero value".
foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.api.java.JavaPairRDD
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
foldByKey(V, Partitioner, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
foldByKey(V, int, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
foldByKey(V, Function2<V, V, V>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
foreach(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Applies a function f to all elements of this RDD.
foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
Applies a function f to all elements of this RDD.
foreach(Function1<Row, BoxedUnit>) - Method in class org.apache.spark.sql.DataFrame
Applies a function f to all rows.
foreach(Function1<T, BoxedUnit>) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Runs func on each element of this Dataset.
foreach(ForeachFunction<T>) - Method in class org.apache.spark.sql.Dataset
(Java-specific) Runs func on each element of this Dataset.
foreach(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Deprecated.
As of release 0.9.0, replaced by foreachRDD
foreach(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Deprecated.
As of release 0.9.0, replaced by foreachRDD
foreach(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
Deprecated.
As of 0.9.0, replaced by foreachRDD.
foreach(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
Deprecated.
As of 0.9.0, replaced by foreachRDD.
foreachActive(Function2<Object, Object, BoxedUnit>) - Method in class org.apache.spark.mllib.linalg.DenseVector
 
foreachActive(Function3<Object, Object, Object, BoxedUnit>) - Method in interface org.apache.spark.mllib.linalg.Matrix
Applies a function f to all the active elements of dense and sparse matrix.
foreachActive(Function2<Object, Object, BoxedUnit>) - Method in class org.apache.spark.mllib.linalg.SparseVector
 
foreachActive(Function2<Object, Object, BoxedUnit>) - Method in interface org.apache.spark.mllib.linalg.Vector
Applies a function f to all the active elements of dense and sparse vector.
foreachAsync(VoidFunction<T>) - Method in interface org.apache.spark.api.java.JavaRDDLike
The asynchronous version of the foreach action, which applies a function f to all the elements of this RDD.
foreachAsync(Function1<T, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
Applies a function f to all elements of this RDD.
ForeachFunction<T> - Interface in org.apache.spark.api.java.function
Base interface for a function used in Dataset's foreach function.
foreachPartition(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
Applies a function f to each partition of this RDD.
foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
Applies a function f to each partition of this RDD.
foreachPartition(Function1<Iterator<Row>, BoxedUnit>) - Method in class org.apache.spark.sql.DataFrame
Applies a function f to each partition of this DataFrame.
foreachPartition(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.sql.Dataset
(Scala-specific) Runs func on each partition of this Dataset.
foreachPartition(ForeachPartitionFunction<T>) - Method in class org.apache.spark.sql.Dataset
(Java-specific) Runs func on each partition of this Dataset.
foreachPartitionAsync(VoidFunction<Iterator<T>>) - Method in interface org.apache.spark.api.java.JavaRDDLike
The asynchronous version of the foreachPartition action, which applies a function f to each partition of this RDD.
foreachPartitionAsync(Function1<Iterator<T>, BoxedUnit>) - Method in class org.apache.spark.rdd.AsyncRDDActions
Applies a function f to each partition of this RDD.
ForeachPartitionFunction<T> - Interface in org.apache.spark.api.java.function
Base interface for a function used in Dataset's foreachPartition function.
foreachRDD(Function<R, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Deprecated.
As of release 1.6.0, replaced by foreachRDD(JVoidFunction)
foreachRDD(Function2<R, Time, Void>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Deprecated.
As of release 1.6.0, replaced by foreachRDD(JVoidFunction2)
foreachRDD(VoidFunction<R>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Apply a function to each RDD in this DStream.
foreachRDD(VoidFunction2<R, Time>) - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Apply a function to each RDD in this DStream.
foreachRDD(Function1<RDD<T>, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
Apply a function to each RDD in this DStream.
foreachRDD(Function2<RDD<T>, Time, BoxedUnit>) - Method in class org.apache.spark.streaming.dstream.DStream
Apply a function to each RDD in this DStream.
foreachWith(Function1<Object, A>, Function2<T, A, BoxedUnit>) - Method in class org.apache.spark.rdd.RDD
Applies f to each element of this RDD, where f takes an additional parameter of type A.
format(String) - Method in class org.apache.spark.sql.DataFrameReader
Specifies the input data source format.
format(String) - Method in class org.apache.spark.sql.DataFrameWriter
Specifies the underlying output data source.
format_number(Column, int) - Static method in class org.apache.spark.sql.functions
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.
format_string(String, Column...) - Static method in class org.apache.spark.sql.functions
Formats the arguments in printf-style and returns the result as a string column.
format_string(String, Seq<Column>) - Static method in class org.apache.spark.sql.functions
Formats the arguments in printf-style and returns the result as a string column.
formatVersion() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
 
formatVersion() - Method in class org.apache.spark.mllib.classification.NaiveBayesModel
 
formatVersion() - Method in class org.apache.spark.mllib.classification.SVMModel
 
formatVersion() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
 
formatVersion() - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
 
formatVersion() - Method in class org.apache.spark.mllib.clustering.KMeansModel
 
formatVersion() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
 
formatVersion() - Method in class org.apache.spark.mllib.clustering.PowerIterationClusteringModel
 
formatVersion() - Method in class org.apache.spark.mllib.feature.ChiSqSelectorModel
 
formatVersion() - Method in class org.apache.spark.mllib.feature.Word2VecModel
 
formatVersion() - Method in class org.apache.spark.mllib.recommendation.MatrixFactorizationModel
 
formatVersion() - Method in class org.apache.spark.mllib.regression.IsotonicRegressionModel
 
formatVersion() - Method in class org.apache.spark.mllib.regression.LassoModel
 
formatVersion() - Method in class org.apache.spark.mllib.regression.LinearRegressionModel
 
formatVersion() - Method in class org.apache.spark.mllib.regression.RidgeRegressionModel
 
formatVersion() - Method in class org.apache.spark.mllib.tree.model.DecisionTreeModel
 
formatVersion() - Method in class org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
 
formatVersion() - Method in class org.apache.spark.mllib.tree.model.RandomForestModel
 
formatVersion() - Method in interface org.apache.spark.mllib.util.Saveable
Current version of model save/load format.
formula() - Method in class org.apache.spark.ml.feature.RFormula
R formula parameter.
FPGrowth - Class in org.apache.spark.mllib.fpm
A parallel FP-growth algorithm to mine frequent itemsets.
FPGrowth() - Constructor for class org.apache.spark.mllib.fpm.FPGrowth
Constructs a default instance with default parameters {minSupport: 0.3, numPartitions: same as the input data}.
FPGrowth.FreqItemset<Item> - Class in org.apache.spark.mllib.fpm
Frequent itemset.
FPGrowth.FreqItemset(Object, long) - Constructor for class org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
 
FPGrowthModel<Item> - Class in org.apache.spark.mllib.fpm
Model trained by FPGrowth, which holds frequent itemsets.
FPGrowthModel(RDD<FPGrowth.FreqItemset<Item>>, ClassTag<Item>) - Constructor for class org.apache.spark.mllib.fpm.FPGrowthModel
 
fractional() - Method in class org.apache.spark.sql.types.DecimalType
 
fractional() - Method in class org.apache.spark.sql.types.DoubleType
 
fractional() - Method in class org.apache.spark.sql.types.FloatType
 
freq() - Method in class org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
 
freq() - Method in class org.apache.spark.mllib.fpm.PrefixSpan.FreqSequence
 
freqItems(String[], double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Finding frequent items for columns, possibly with false positives.
freqItems(String[]) - Method in class org.apache.spark.sql.DataFrameStatFunctions
Finding frequent items for columns, possibly with false positives.
freqItems(Seq<String>, double) - Method in class org.apache.spark.sql.DataFrameStatFunctions
(Scala-specific) Finding frequent items for columns, possibly with false positives.
freqItems(Seq<String>) - Method in class org.apache.spark.sql.DataFrameStatFunctions
(Scala-specific) Finding frequent items for columns, possibly with false positives.
freqItemsets() - Method in class org.apache.spark.mllib.fpm.FPGrowthModel
 
freqSequences() - Method in class org.apache.spark.mllib.fpm.PrefixSpanModel
 
from_unixtime(Column) - Static method in class org.apache.spark.sql.functions
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
from_unixtime(Column, String) - Static method in class org.apache.spark.sql.functions
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
from_utc_timestamp(Column, String) - Static method in class org.apache.spark.sql.functions
Assumes given timestamp is UTC and converts to given timezone.
fromAttributes(Seq<Attribute>) - Static method in class org.apache.spark.sql.types.StructType
 
fromAvroFlumeEvent(AvroFlumeEvent) - Static method in class org.apache.spark.streaming.flume.SparkFlumeEvent
 
fromCaseClassString(String) - Static method in class org.apache.spark.sql.types.DataType
Deprecated.
As of 1.2.0, replaced by DataType.fromJson()
fromCOO(int, int, Iterable<Tuple3<Object, Object, Object>>) - Static method in class org.apache.spark.mllib.linalg.SparseMatrix
Generate a SparseMatrix from Coordinate List (COO) format.
fromDStream(DStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaDStream
Convert a scala DStream to a Java-friendly JavaDStream.
fromEdgePartitions(RDD<Tuple2<Object, EdgePartition<ED, VD>>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from EdgePartitions, setting referenced vertices to `defaultVertexAttr`.
fromEdges(RDD<Edge<ED>>, ClassTag<ED>, ClassTag<VD>) - Static method in class org.apache.spark.graphx.EdgeRDD
Creates an EdgeRDD from a set of edges.
fromEdges(RDD<Edge<ED>>, VD, StorageLevel, StorageLevel, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.Graph
Construct a graph from a collection of edges.
fromEdges(EdgeRDD<?>, int, VD, ClassTag<VD>) - Static method in class org.apache.spark.graphx.VertexRDD
Constructs a VertexRDD containing all vertices referred to in edges.
fromEdgeTuples(RDD<Tuple2<Object, Object>>, VD, Option<PartitionStrategy>, StorageLevel, StorageLevel, ClassTag<VD>) - Static method in class org.apache.spark.graphx.Graph
Construct a graph from a collection of edges encoded as vertex id pairs.
fromExistingRDDs(VertexRDD<VD>, EdgeRDD<ED>, ClassTag<VD>, ClassTag<ED>) - Static method in class org.apache.spark.graphx.impl.GraphImpl
Create a graph from a VertexRDD and an EdgeRDD with the same replicated vertex type as the vertices.
fromInputDStream(InputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaInputDStream
Convert a scala InputDStream to a Java-friendly JavaInputDStream.
fromInputDStream(InputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairInputDStream
Convert a scala InputDStream of pairs to a Java-friendly JavaPairInputDStream.
fromJavaDStream(JavaDStream<Tuple2<K, V>>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
 
fromJavaRDD(JavaRDD<Tuple2<K, V>>) - Static method in class org.apache.spark.api.java.JavaPairRDD
Convert a JavaRDD of key-value pairs to JavaPairRDD.
fromJson(String) - Static method in class org.apache.spark.mllib.linalg.Vectors
Parses the JSON representation of a vector into a Vector.
fromJson(String) - Static method in class org.apache.spark.sql.types.DataType
 
fromJson(String) - Static method in class org.apache.spark.sql.types.Metadata
Creates a Metadata instance from JSON.
fromName(String) - Static method in class org.apache.spark.ml.attribute.AttributeType
Gets the AttributeType object from its name.
fromOffset() - Method in class org.apache.spark.streaming.kafka.OffsetRange
 
fromOld(DecisionTreeModel, DecisionTreeClassifier, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.classification.DecisionTreeClassificationModel
(private[ml]) Convert a model from the old API
fromOld(GradientBoostedTreesModel, GBTClassifier, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.classification.GBTClassificationModel
(private[ml]) Convert a model from the old API
fromOld(RandomForestModel, RandomForestClassifier, Map<Object, Object>, int, int) - Static method in class org.apache.spark.ml.classification.RandomForestClassificationModel
(private[ml]) Convert a model from the old API
fromOld(DecisionTreeModel, DecisionTreeRegressor, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.regression.DecisionTreeRegressionModel
(private[ml]) Convert a model from the old API
fromOld(GradientBoostedTreesModel, GBTRegressor, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.regression.GBTRegressionModel
(private[ml]) Convert a model from the old API
fromOld(RandomForestModel, RandomForestRegressor, Map<Object, Object>, int) - Static method in class org.apache.spark.ml.regression.RandomForestRegressionModel
(private[ml]) Convert a model from the old API
fromOld(Node, Map<Object, Object>) - Static method in class org.apache.spark.ml.tree.Node
Create a new Node from the old Node format, recursively creating child nodes as needed.
fromPairDStream(DStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairDStream
 
fromPairRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.mllib.rdd.MLPairRDDFunctions
Implicit conversion from a pair RDD to MLPairRDDFunctions.
fromRDD(RDD<Object>) - Static method in class org.apache.spark.api.java.JavaDoubleRDD
 
fromRDD(RDD<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.api.java.JavaPairRDD
 
fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.api.java.JavaRDD
 
fromRDD(RDD<T>, ClassTag<T>) - Static method in class org.apache.spark.mllib.rdd.RDDFunctions
Implicit conversion from an RDD to RDDFunctions.
fromRdd(RDD<?>) - Static method in class org.apache.spark.storage.RDDInfo
 
fromReceiverInputDStream(ReceiverInputDStream<Tuple2<K, V>>, ClassTag<K>, ClassTag<V>) - Static method in class org.apache.spark.streaming.api.java.JavaPairReceiverInputDStream
Convert a scala ReceiverInputDStream to a Java-friendly JavaReceiverInputDStream.
fromReceiverInputDStream(ReceiverInputDStream<T>, ClassTag<T>) - Static method in class org.apache.spark.streaming.api.java.JavaReceiverInputDStream
Convert a scala ReceiverInputDStream to a Java-friendly JavaReceiverInputDStream.
fromSparkContext(SparkContext) - Static method in class org.apache.spark.api.java.JavaSparkContext
 
fromStage(Stage, int, Option<Object>, Seq<Seq<TaskLocation>>) - Static method in class org.apache.spark.scheduler.StageInfo
Construct a StageInfo from a Stage.
fromString(String) - Static method in enum org.apache.spark.JobExecutionStatus
 
fromString(String) - Static method in class org.apache.spark.mllib.tree.loss.Losses
 
fromString(String) - Static method in enum org.apache.spark.status.api.v1.ApplicationStatus
 
fromString(String) - Static method in enum org.apache.spark.status.api.v1.StageStatus
 
fromString(String) - Static method in enum org.apache.spark.status.api.v1.TaskSorting
 
fromString(String) - Static method in class org.apache.spark.storage.StorageLevel
:: DeveloperApi :: Return the StorageLevel object with the specified name.
fromStructField(StructField) - Static method in class org.apache.spark.ml.attribute.AttributeGroup
Creates an attribute group from a StructField instance.
fullOuterJoin(JavaPairRDD<K, W>, Partitioner) - Method in class org.apache.spark.api.java.JavaPairRDD
Perform a full outer join of this and other.
fullOuterJoin(JavaPairRDD<K, W>) - Method in class org.apache.spark.api.java.JavaPairRDD
Perform a full outer join of this and other.
fullOuterJoin(JavaPairRDD<K, W>, int) - Method in class org.apache.spark.api.java.JavaPairRDD
Perform a full outer join of this and other.
fullOuterJoin(RDD<Tuple2<K, W>>, Partitioner) - Method in class org.apache.spark.rdd.PairRDDFunctions
Perform a full outer join of this and other.
fullOuterJoin(RDD<Tuple2<K, W>>) - Method in class org.apache.spark.rdd.PairRDDFunctions
Perform a full outer join of this and other.
fullOuterJoin(RDD<Tuple2<K, W>>, int) - Method in class org.apache.spark.rdd.PairRDDFunctions
Perform a full outer join of this and other.
fullOuterJoin(JavaPairDStream<K, W>) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.
fullOuterJoin(JavaPairDStream<K, W>, int) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.
fullOuterJoin(JavaPairDStream<K, W>, Partitioner) - Method in class org.apache.spark.streaming.api.java.JavaPairDStream
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.
fullOuterJoin(DStream<Tuple2<K, W>>, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.
fullOuterJoin(DStream<Tuple2<K, W>>, int, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.
fullOuterJoin(DStream<Tuple2<K, W>>, Partitioner, ClassTag<W>) - Method in class org.apache.spark.streaming.dstream.PairDStreamFunctions
Return a new DStream by applying 'full outer join' between RDDs of this DStream and other DStream.
fullStackTrace() - Method in class org.apache.spark.ExceptionFailure
 
Function<T1,R> - Interface in org.apache.spark.api.java.function
Base interface for functions whose return types do not create special RDDs.
function(Function4<Time, KeyType, Option<ValueType>, State<StateType>, Option<MappedType>>) - Static method in class org.apache.spark.streaming.StateSpec
Create a StateSpec for setting all the specifications of the mapWithState operation on a pair DStream.
function(Function3<KeyType, Option<ValueType>, State<StateType>, MappedType>) - Static method in class org.apache.spark.streaming.StateSpec
Create a StateSpec for setting all the specifications of the mapWithState operation on a pair DStream.
function(Function4<Time, KeyType, Optional<ValueType>, State<StateType>, Optional<MappedType>>) - Static method in class org.apache.spark.streaming.StateSpec
Create a StateSpec for setting all the specifications of the mapWithState operation on a JavaPairDStream.
function(Function3<KeyType, Optional<ValueType>, State<StateType>, MappedType>) - Static method in class org.apache.spark.streaming.StateSpec
Create a StateSpec for setting all the specifications of the mapWithState operation on a JavaPairDStream.
Function0<R> - Interface in org.apache.spark.api.java.function
A zero-argument function that returns an R.
Function2<T1,T2,R> - Interface in org.apache.spark.api.java.function
A two-argument function that takes arguments of type T1 and T2 and returns an R.
Function3<T1,T2,T3,R> - Interface in org.apache.spark.api.java.function
A three-argument function that takes arguments of type T1, T2 and T3 and returns an R.
Function4<T1,T2,T3,T4,R> - Interface in org.apache.spark.api.java.function
A four-argument function that takes arguments of type T1, T2, T3 and T4 and returns an R.
functionRegistry() - Method in class org.apache.spark.sql.hive.HiveContext
 
functionRegistry() - Method in class org.apache.spark.sql.SQLContext
 
functions - Class in org.apache.spark.sql
 
functions() - Constructor for class org.apache.spark.sql.functions
 
FutureAction<T> - Interface in org.apache.spark
A future for the result of an action to support cancellation.
futureExecutionContext() - Static method in class org.apache.spark.rdd.AsyncRDDActions
 

G

gain() - Method in class org.apache.spark.ml.tree.InternalNode
 
gain() - Method in class org.apache.spark.mllib.tree.model.InformationGainStats
 
gamma1() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
 
gamma2() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
 
gamma6() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
 
gamma7() - Method in class org.apache.spark.graphx.lib.SVDPlusPlus.Conf
 
GammaGenerator - Class in org.apache.spark.mllib.random
:: DeveloperApi :: Generates i.i.d.
GammaGenerator(double, double) - Constructor for class org.apache.spark.mllib.random.GammaGenerator
 
gammaJavaRDD(JavaSparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
gammaJavaRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
gammaJavaRDD(JavaSparkContext, double, double, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
gammaJavaVectorRDD(JavaSparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
gammaJavaVectorRDD(JavaSparkContext, double, double, long, int, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
gammaJavaVectorRDD(JavaSparkContext, double, double, long, int) - Static method in class org.apache.spark.mllib.random.RandomRDDs
gammaRDD(SparkContext, double, double, long, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
Generates an RDD comprised of i.i.d. samples from the gamma distribution with the input shape and scale.
gammaShape() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
 
gammaShape() - Method in class org.apache.spark.mllib.clustering.LDAModel
Shape parameter for random initialization of variational parameter gamma.
gammaShape() - Method in class org.apache.spark.mllib.clustering.LocalLDAModel
 
gammaVectorRDD(SparkContext, double, double, long, int, int, long) - Static method in class org.apache.spark.mllib.random.RandomRDDs
Generates an RDD[Vector] with vectors containing i.i.d. samples drawn from the gamma distribution with the input shape and scale.
gaps() - Method in class org.apache.spark.ml.feature.RegexTokenizer
Indicates whether regex splits on gaps (true) or matches tokens (false).
GaussianMixture - Class in org.apache.spark.mllib.clustering
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).
GaussianMixture() - Constructor for class org.apache.spark.mllib.clustering.GaussianMixture
Constructs a default instance.
GaussianMixtureModel - Class in org.apache.spark.mllib.clustering
Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are the respective mean and covariance for each Gaussian distribution i=1..k.
GaussianMixtureModel(double[], MultivariateGaussian[]) - Constructor for class org.apache.spark.mllib.clustering.GaussianMixtureModel
 
gaussians() - Method in class org.apache.spark.mllib.clustering.GaussianMixtureModel
 
GBTClassificationModel - Class in org.apache.spark.ml.classification
:: Experimental :: Gradient-Boosted Trees (GBTs) model for classification.
GBTClassificationModel(String, DecisionTreeRegressionModel[], double[]) - Constructor for class org.apache.spark.ml.classification.GBTClassificationModel
Construct a GBTClassificationModel
GBTClassifier - Class in org.apache.spark.ml.classification
:: Experimental :: Gradient-Boosted Trees (GBTs) learning algorithm for classification.
GBTClassifier(String) - Constructor for class org.apache.spark.ml.classification.GBTClassifier
 
GBTClassifier() - Constructor for class org.apache.spark.ml.classification.GBTClassifier
 
GBTRegressionModel - Class in org.apache.spark.ml.regression
:: Experimental ::
GBTRegressionModel(String, DecisionTreeRegressionModel[], double[]) - Constructor for class org.apache.spark.ml.regression.GBTRegressionModel
Construct a GBTRegressionModel
GBTRegressor - Class in org.apache.spark.ml.regression
:: Experimental :: Gradient-Boosted Trees (GBTs) learning algorithm for regression.
GBTRegressor(String) - Constructor for class org.apache.spark.ml.regression.GBTRegressor
 
GBTRegressor() - Constructor for class org.apache.spark.ml.regression.GBTRegressor
 
GeneralizedLinearAlgorithm<M extends GeneralizedLinearModel> - Class in org.apache.spark.mllib.regression
:: DeveloperApi :: GeneralizedLinearAlgorithm implements methods to train a Generalized Linear Model (GLM).
GeneralizedLinearAlgorithm() - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
 
GeneralizedLinearModel - Class in org.apache.spark.mllib.regression
:: DeveloperApi :: GeneralizedLinearModel (GLM) represents a model trained using GeneralizedLinearAlgorithm.
GeneralizedLinearModel(Vector, double) - Constructor for class org.apache.spark.mllib.regression.GeneralizedLinearModel
 
generateAssociationRules(double) - Method in class org.apache.spark.mllib.fpm.FPGrowthModel
Generates association rules for the Items in freqItemsets.
generatedRDDs() - Method in class org.apache.spark.streaming.dstream.DStream
 
generateKMeansRDD(SparkContext, int, int, int, double, int) - Static method in class org.apache.spark.mllib.util.KMeansDataGenerator
Generate an RDD containing test data for KMeans.
generateLinearInput(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)^2^ / 12 which will be (1.0/3.0)
generateLinearInput(double, double[], double[], double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
 
generateLinearInput(double, double[], double[], double[], int, int, double, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
 
generateLinearInputAsList(double, double[], int, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
Return a Java List of synthetic data randomly generated according to a multi collinear model.
generateLinearRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LinearDataGenerator
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and uregularized variants.
generateLogisticRDD(SparkContext, int, int, double, int, double) - Static method in class org.apache.spark.mllib.util.LogisticRegressionDataGenerator
Generate an RDD containing test data for LogisticRegression.
generateRandomEdges(int, int, int, long) - Static method in class org.apache.spark.graphx.util.GraphGenerators
 
geq(Object) - Method in class org.apache.spark.sql.Column
Greater than or equal to an expression.
get() - Method in interface org.apache.spark.FutureAction
Blocks and returns the result of this job.
get(Param<T>) - Method in class org.apache.spark.ml.param.ParamMap
Optionally returns the value associated with a param.
get(Param<T>) - Method in interface org.apache.spark.ml.param.Params
 
get(String) - Method in class org.apache.spark.SparkConf
Get a parameter; throws a NoSuchElementException if it's not set
get(String, String) - Method in class org.apache.spark.SparkConf
Get a parameter, falling back to a default if not set
get() - Static method in class org.apache.spark.SparkEnv
Returns the SparkEnv.
get(String) - Static method in class org.apache.spark.SparkFiles
Get the absolute path of a file added through SparkContext.addFile().
get(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i.
get() - Method in class org.apache.spark.streaming.State
Get the state if it exists, otherwise it will throw java.util.NoSuchElementException.
get() - Static method in class org.apache.spark.TaskContext
Return the currently active TaskContext.
get_json_object(Column, String) - Static method in class org.apache.spark.sql.functions
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.
getActive() - Static method in class org.apache.spark.streaming.StreamingContext
:: Experimental ::
getActiveJobIds() - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
Returns an array containing the ids of all active jobs.
getActiveJobIds() - Method in class org.apache.spark.SparkStatusTracker
Returns an array containing the ids of all active jobs.
getActiveOrCreate(Function0<StreamingContext>) - Static method in class org.apache.spark.streaming.StreamingContext
:: Experimental ::
getActiveOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
:: Experimental ::
getActiveStageIds() - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
Returns an array containing the ids of all active stages.
getActiveStageIds() - Method in class org.apache.spark.SparkStatusTracker
Returns an array containing the ids of all active stages.
getAkkaConf() - Method in class org.apache.spark.SparkConf
Get all akka conf variables set on this SparkConf
getAlgo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getAll() - Method in class org.apache.spark.SparkConf
Get all parameters as a list of pairs
getAllConfs() - Method in class org.apache.spark.sql.SQLContext
Return all the configuration properties that have been set (i.e.
getAllPools() - Method in class org.apache.spark.SparkContext
:: DeveloperApi :: Return pools for fair scheduler
getAlpha() - Method in class org.apache.spark.mllib.clustering.LDA
Alias for getDocConcentration
getAnyValAs(int) - Method in interface org.apache.spark.sql.Row
Returns the value of a given fieldName.
getAppId() - Method in interface org.apache.spark.launcher.SparkAppHandle
Returns the application ID, or null if not yet known.
getAppId() - Method in class org.apache.spark.SparkConf
Returns the Spark application id, valid in the Driver after TaskScheduler registration and from the start in the Executor.
getAs(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i.
getAs(String) - Method in interface org.apache.spark.sql.Row
Returns the value of a given fieldName.
getAsymmetricAlpha() - Method in class org.apache.spark.mllib.clustering.LDA
Alias for getAsymmetricDocConcentration
getAsymmetricDocConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").
getAttr(String) - Method in class org.apache.spark.ml.attribute.AttributeGroup
Gets an attribute by its name.
getAttr(int) - Method in class org.apache.spark.ml.attribute.AttributeGroup
Gets an attribute by its index.
getAvroSchema() - Method in class org.apache.spark.SparkConf
Gets all the avro schemas in the configuration used in the generic Avro record serializer
getBeta() - Method in class org.apache.spark.mllib.clustering.LDA
Alias for getTopicConcentration
getBlock(BlockId) - Method in class org.apache.spark.storage.StorageStatus
Return the given block stored in this block manager in O(1) time.
getBoolean(String, boolean) - Method in class org.apache.spark.SparkConf
Get a parameter as a boolean, falling back to a default if not set
getBoolean(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive boolean.
getBoolean(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Boolean.
getBooleanArray(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Boolean array.
getByte(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive byte.
getCachedBlockManagerId(BlockManagerId) - Static method in class org.apache.spark.storage.BlockManagerId
 
getCachedMetadata(String) - Static method in class org.apache.spark.rdd.HadoopRDD
The three methods below are helpers for accessing the local map, a property of the SparkEnv of the local process.
getCaseSensitive() - Method in class org.apache.spark.ml.feature.StopWordsRemover
 
getCatalystType(int, String, int, MetadataBuilder) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
 
getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
 
getCatalystType(int, String, int, MetadataBuilder) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Get the custom datatype mapping for the given jdbc meta information.
getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
 
getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
 
getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.OracleDialect
 
getCatalystType(int, String, int, MetadataBuilder) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
 
getCategoricalFeaturesInfo() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getCheckpointDir() - Method in class org.apache.spark.api.java.JavaSparkContext
 
getCheckpointDir() - Method in class org.apache.spark.SparkContext
 
getCheckpointFile() - Method in interface org.apache.spark.api.java.JavaRDDLike
Gets the name of the file to which this RDD was checkpointed
getCheckpointFile() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
 
getCheckpointFile() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
getCheckpointFile() - Method in class org.apache.spark.rdd.RDD
Gets the name of the directory to which this RDD was checkpointed.
getCheckpointFiles() - Method in class org.apache.spark.graphx.Graph
Gets the name of the files to which this Graph was checkpointed.
getCheckpointFiles() - Method in class org.apache.spark.graphx.impl.GraphImpl
 
getCheckpointInterval() - Method in class org.apache.spark.mllib.clustering.LDA
Period (in iterations) between checkpoints.
getCheckpointInterval() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getConf() - Method in class org.apache.spark.api.java.JavaSparkContext
Return a copy of this JavaSparkContext's configuration.
getConf() - Method in class org.apache.spark.rdd.HadoopRDD
 
getConf() - Method in class org.apache.spark.rdd.NewHadoopRDD
 
getConf() - Method in class org.apache.spark.SparkContext
Return a copy of this SparkContext's configuration.
getConf(String) - Method in class org.apache.spark.sql.SQLContext
Return the value of Spark SQL configuration property for the given key.
getConf(String, String) - Method in class org.apache.spark.sql.SQLContext
Return the value of Spark SQL configuration property for the given key.
getConnection() - Method in interface org.apache.spark.rdd.JdbcRDD.ConnectionFactory
 
getConvergenceTol() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
Return the largest change in log-likelihood at which convergence is considered to have occurred.
getDate(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of date type as java.sql.Date.
getDecimal(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of decimal type as java.math.BigDecimal.
getDefault(Param<T>) - Method in interface org.apache.spark.ml.param.Params
Gets the default value of a parameter.
getDegree() - Method in class org.apache.spark.ml.feature.PolynomialExpansion
 
getDependencies() - Method in class org.apache.spark.rdd.CoGroupedRDD
 
getDependencies() - Method in class org.apache.spark.rdd.RDD
Implemented by subclasses to return how this RDD depends on parent RDDs.
getDependencies() - Method in class org.apache.spark.rdd.ShuffledRDD
 
getDependencies() - Method in class org.apache.spark.rdd.UnionRDD
 
getDeprecatedConfig(String, SparkConf) - Static method in class org.apache.spark.SparkConf
Looks for available deprecated keys for the given config option, and return the first value available.
getDocConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").
getDouble(String, double) - Method in class org.apache.spark.SparkConf
Get a parameter as a double, falling back to a default if not set
getDouble(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive double.
getDouble(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Double.
getDoubleArray(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Double array.
getEpsilon() - Method in class org.apache.spark.mllib.clustering.KMeans
The distance threshold within which we've consider centers to have converged.
getExecutorEnv() - Method in class org.apache.spark.SparkConf
Get all executor environment variables set on this SparkConf
getExecutorMemoryStatus() - Method in class org.apache.spark.SparkContext
Return a map from the slave to the max memory available for caching and the remaining memory available for caching.
getExecutorStorageStatus() - Method in class org.apache.spark.SparkContext
:: DeveloperApi :: Return information about blocks stored in all of the slaves
getField(String) - Method in class org.apache.spark.sql.Column
An expression that gets a field by name in a StructType.
getFinalValue() - Method in class org.apache.spark.partial.PartialResult
Blocking method to wait for and return the final value.
getFloat(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive float.
getFormula() - Method in class org.apache.spark.ml.feature.RFormula
 
getGaps() - Method in class org.apache.spark.ml.feature.RegexTokenizer
 
getImpurity() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getIndices() - Method in class org.apache.spark.ml.feature.VectorSlicer
 
getInitializationMode() - Method in class org.apache.spark.mllib.clustering.KMeans
The initialization algorithm.
getInitializationSteps() - Method in class org.apache.spark.mllib.clustering.KMeans
Number of steps for the k-means|| initialization mode
getInitialModel() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
Return the user supplied initial GMM, if supplied
getInitialPositionInStream(int) - Method in class org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
 
getInputFormat(JobConf) - Method in class org.apache.spark.rdd.HadoopRDD
 
getInt(String, int) - Method in class org.apache.spark.SparkConf
Get a parameter as an integer, falling back to a default if not set
getInt(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive int.
getInverse() - Method in class org.apache.spark.ml.feature.DCT
 
getItem(Object) - Method in class org.apache.spark.sql.Column
An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.
getJavaMap(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of array type as a Map.
getJDBCType(DataType) - Method in class org.apache.spark.sql.jdbc.AggregatedDialect
 
getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.DB2Dialect
 
getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.DerbyDialect
 
getJDBCType(DataType) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Retrieve the jdbc / sql type for a given datatype.
getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.MsSqlServerDialect
 
getJDBCType(DataType) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
 
getJobConf() - Method in class org.apache.spark.rdd.HadoopRDD
 
getJobIdsForGroup(String) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
Return a list of all known jobs in a particular job group.
getJobIdsForGroup(String) - Method in class org.apache.spark.SparkStatusTracker
Return a list of all known jobs in a particular job group.
getJobInfo(int) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
Returns job information, or null if the job info could not be found or was garbage collected.
getJobInfo(int) - Method in class org.apache.spark.SparkStatusTracker
Returns job information, or None if the job info could not be found or was garbage collected.
getK() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
Gets the desired number of leaf clusters.
getK() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
Return the number of Gaussians in the mixture model
getK() - Method in class org.apache.spark.mllib.clustering.KMeans
Number of clusters to create (k).
getK() - Method in class org.apache.spark.mllib.clustering.LDA
Number of topics to infer.
getKappa() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
Learning rate: exponential decay rate
getLabels() - Method in class org.apache.spark.ml.feature.IndexToString
 
getLambda() - Method in class org.apache.spark.mllib.classification.NaiveBayes
 
getLDAModel(double[]) - Method in interface org.apache.spark.mllib.clustering.LDAOptimizer
 
getLearningRate() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
getLeastGroupHash(String) - Method in class org.apache.spark.rdd.PartitionCoalescer
Sorts and gets the least element of the list associated with key in groupHash The returned PartitionGroup is the least loaded of all groups that represent the machine "key"
getList(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of array type as List.
getLocalProperty(String) - Method in class org.apache.spark.api.java.JavaSparkContext
Get a local property set in this thread, or null if it is missing.
getLocalProperty(String) - Method in class org.apache.spark.SparkContext
Get a local property set in this thread, or null if it is missing.
getLong(String, long) - Method in class org.apache.spark.SparkConf
Get a parameter as a long, falling back to a default if not set
getLong(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive long.
getLong(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Long.
getLongArray(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Long array.
getLoss() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
getLossType() - Method in class org.apache.spark.ml.classification.GBTClassifier
 
getLossType() - Method in class org.apache.spark.ml.regression.GBTRegressor
 
getMap(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of map type as a Scala Map.
getMap() - Method in class org.apache.spark.sql.types.MetadataBuilder
Returns the immutable version of this map.
getMaxBins() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getMaxDepth() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getMaxIterations() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
Gets the max number of k-means iterations to split clusters.
getMaxIterations() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
Return the maximum number of iterations to run
getMaxIterations() - Method in class org.apache.spark.mllib.clustering.KMeans
Maximum number of iterations to run.
getMaxIterations() - Method in class org.apache.spark.mllib.clustering.LDA
Maximum number of iterations for learning.
getMaxLocalProjDBSize() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
Gets the maximum number of items allowed in a projected database before local processing.
getMaxMemoryInMB() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getMaxPatternLength() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
Gets the maximal pattern length (i.e.
getMessage() - Method in exception org.apache.spark.sql.AnalysisException
 
getMetadata(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Metadata.
getMetadataArray(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a Metadata array.
getMetricName() - Method in class org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
 
getMetricName() - Method in class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
 
getMetricName() - Method in class org.apache.spark.ml.evaluation.RegressionEvaluator
 
getMetricsSources(String) - Method in class org.apache.spark.TaskContext
::DeveloperApi:: Returns all metrics sources with the given name which are associated with the instance which runs the task.
getMinDivisibleClusterSize() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
Gets the minimum number of points (if >= 1.0) or the minimum proportion of points (if < 1.0) of a divisible cluster.
getMiniBatchFraction() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
Mini-batch fraction, which sets the fraction of document sampled and used in each iteration
getMinInfoGain() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getMinInstancesPerNode() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getMinSupport() - Method in class org.apache.spark.mllib.fpm.PrefixSpan
Get the minimal support (i.e.
getMinTokenLength() - Method in class org.apache.spark.ml.feature.RegexTokenizer
 
getModel() - Method in class org.apache.spark.ml.clustering.DistributedLDAModel
 
getModel() - Method in class org.apache.spark.ml.clustering.LDAModel
Returns underlying spark.mllib model, which may be local or distributed
getModel() - Method in class org.apache.spark.ml.clustering.LocalLDAModel
 
getModelType() - Method in class org.apache.spark.mllib.classification.NaiveBayes
 
getN() - Method in class org.apache.spark.ml.feature.NGram
 
getNames() - Method in class org.apache.spark.ml.feature.VectorSlicer
 
getNode(int, Node) - Static method in class org.apache.spark.mllib.tree.model.Node
Traces down from a root node to get the node with the given node index.
getNumClasses() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getNumFeatures() - Method in class org.apache.spark.ml.feature.HashingTF
 
getNumFeatures() - Method in class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
The dimension of training features.
getNumIterations() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
getNumPartitions() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return the number of partitions in this RDD.
getNumPartitions() - Method in class org.apache.spark.rdd.RDD
Returns the number of partitions of this RDD.
getNumValues() - Method in class org.apache.spark.ml.attribute.NominalAttribute
Get the number of values, either from numValues or from values.
getOldDataset(DataFrame, String) - Static method in class org.apache.spark.ml.clustering.LDA
Get dataset for spark.mllib LDA
getOptimizeDocConcentration() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
Optimize docConcentration, indicates whether docConcentration (Dirichlet parameter for document-topic distribution) will be optimized during training.
getOptimizer() - Method in class org.apache.spark.mllib.clustering.LDA
:: DeveloperApi ::
getOption(String) - Method in class org.apache.spark.SparkConf
Get a parameter as an Option
getOption() - Method in class org.apache.spark.streaming.State
Get the state as an Option.
getOrCreate(SparkConf) - Static method in class org.apache.spark.SparkContext
This function may be used to get or instantiate a SparkContext and register it as a singleton object.
getOrCreate() - Static method in class org.apache.spark.SparkContext
This function may be used to get or instantiate a SparkContext and register it as a singleton object.
getOrCreate(SparkContext) - Static method in class org.apache.spark.sql.SQLContext
Get the singleton SQLContext if it exists or create a new one using the given SparkContext.
getOrCreate(String, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Deprecated.
As of 1.4.0, replaced by getOrCreate without JavaStreamingContextFactory.
getOrCreate(String, Configuration, JavaStreamingContextFactory) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Deprecated.
As of 1.4.0, replaced by getOrCreate without JavaStreamingContextFactory.
getOrCreate(String, Configuration, JavaStreamingContextFactory, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Deprecated.
As of 1.4.0, replaced by getOrCreate without JavaStreamingContextFactory.
getOrCreate(String, Function0<JavaStreamingContext>) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
getOrCreate(String, Function0<JavaStreamingContext>, Configuration) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
getOrCreate(String, Function0<JavaStreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.api.java.JavaStreamingContext
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
getOrCreate(String, Function0<StreamingContext>, Configuration, boolean) - Static method in class org.apache.spark.streaming.StreamingContext
Either recreate a StreamingContext from checkpoint data or create a new StreamingContext.
getOrDefault(Param<T>) - Method in interface org.apache.spark.ml.param.Params
Gets the value of a param in the embedded param map or its default value.
getOrElse(Param<T>, T) - Method in class org.apache.spark.ml.param.ParamMap
Returns the value associated with a param or a default value.
getP() - Method in class org.apache.spark.ml.feature.Normalizer
 
getParam(String) - Method in interface org.apache.spark.ml.param.Params
 
getParents(int) - Method in class org.apache.spark.NarrowDependency
Get the parent partitions for a child partition.
getParents(int) - Method in class org.apache.spark.OneToOneDependency
 
getParents(int) - Method in class org.apache.spark.RangeDependency
 
getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.CanonicalRandomVertexCut$
 
getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.EdgePartition1D$
 
getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
 
getPartition(long, long, int) - Method in interface org.apache.spark.graphx.PartitionStrategy
Returns the partition number for a given edge.
getPartition(long, long, int) - Method in class org.apache.spark.graphx.PartitionStrategy.RandomVertexCut$
 
getPartition(Object) - Method in class org.apache.spark.HashPartitioner
 
getPartition(Object) - Method in class org.apache.spark.Partitioner
 
getPartition(Object) - Method in class org.apache.spark.RangePartitioner
 
getPartitionId() - Static method in class org.apache.spark.TaskContext
Returns the partition id of currently active TaskContext.
getPartitions() - Method in class org.apache.spark.api.r.BaseRRDD
 
getPartitions() - Method in class org.apache.spark.graphx.EdgeRDD
 
getPartitions() - Method in class org.apache.spark.graphx.VertexRDD
 
getPartitions() - Method in class org.apache.spark.rdd.CoGroupedRDD
 
getPartitions() - Method in class org.apache.spark.rdd.HadoopRDD
 
getPartitions() - Method in class org.apache.spark.rdd.JdbcRDD
 
getPartitions() - Method in class org.apache.spark.rdd.NewHadoopRDD
 
getPartitions() - Method in class org.apache.spark.rdd.PartitionCoalescer
 
getPartitions() - Method in class org.apache.spark.rdd.PartitionPruningRDD
 
getPartitions() - Method in class org.apache.spark.rdd.RDD
Implemented by subclasses to return the set of partitions in this RDD.
getPartitions() - Method in class org.apache.spark.rdd.ShuffledRDD
 
getPartitions() - Method in class org.apache.spark.rdd.UnionRDD
 
getPath() - Method in class org.apache.spark.input.PortableDataStream
 
getPattern() - Method in class org.apache.spark.ml.feature.RegexTokenizer
 
getPersistentRDDs() - Method in class org.apache.spark.SparkContext
Returns an immutable map of RDDs that have marked themselves as persistent via cache() call.
getPoolForName(String) - Method in class org.apache.spark.SparkContext
:: DeveloperApi :: Return the pool associated with the given name, if one exists
getPreferredLocations(Partition) - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.HadoopRDD
 
getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.NewHadoopRDD
 
getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.RDD
Optionally overridden by subclasses to specify placement preferences.
getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.ShuffledRDD
 
getPreferredLocations(Partition) - Method in class org.apache.spark.rdd.UnionRDD
 
getQuantileCalculationStrategy() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getRDDStorageInfo() - Method in class org.apache.spark.SparkContext
:: DeveloperApi :: Return information about what RDDs are cached, if they are in mem or on disk, how much space they take, etc.
getReceiver() - Method in class org.apache.spark.streaming.dstream.ReceiverInputDStream
Gets the receiver object that will be sent to the worker nodes to receive data.
getRootDirectory() - Static method in class org.apache.spark.SparkFiles
Get the root directory that contains files added through SparkContext.addFile().
getRuns() - Method in class org.apache.spark.mllib.clustering.KMeans
:: Experimental :: Number of runs of the algorithm to execute in parallel.
getScalingVec() - Method in class org.apache.spark.ml.feature.ElementwiseProduct
 
getSchedulingMode() - Method in class org.apache.spark.SparkContext
Return current scheduling mode
getSchema(Class<?>) - Method in class org.apache.spark.sql.SQLContext
 
getSeed() - Method in class org.apache.spark.mllib.clustering.BisectingKMeans
Gets the random seed.
getSeed() - Method in class org.apache.spark.mllib.clustering.GaussianMixture
Return the random seed
getSeed() - Method in class org.apache.spark.mllib.clustering.KMeans
The random seed for cluster initialization.
getSeed() - Method in class org.apache.spark.mllib.clustering.LDA
Random seed
getSeq(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of array type as a Scala Seq.
getSerializer(Serializer) - Static method in class org.apache.spark.serializer.Serializer
 
getSerializer(Option<Serializer>) - Static method in class org.apache.spark.serializer.Serializer
 
getShort(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a primitive short.
getSizeAsBytes(String) - Method in class org.apache.spark.SparkConf
Get a size parameter as bytes; throws a NoSuchElementException if it's not set.
getSizeAsBytes(String, String) - Method in class org.apache.spark.SparkConf
Get a size parameter as bytes, falling back to a default if not set.
getSizeAsBytes(String, long) - Method in class org.apache.spark.SparkConf
Get a size parameter as bytes, falling back to a default if not set.
getSizeAsGb(String) - Method in class org.apache.spark.SparkConf
Get a size parameter as Gibibytes; throws a NoSuchElementException if it's not set.
getSizeAsGb(String, String) - Method in class org.apache.spark.SparkConf
Get a size parameter as Gibibytes, falling back to a default if not set.
getSizeAsKb(String) - Method in class org.apache.spark.SparkConf
Get a size parameter as Kibibytes; throws a NoSuchElementException if it's not set.
getSizeAsKb(String, String) - Method in class org.apache.spark.SparkConf
Get a size parameter as Kibibytes, falling back to a default if not set.
getSizeAsMb(String) - Method in class org.apache.spark.SparkConf
Get a size parameter as Mebibytes; throws a NoSuchElementException if it's not set.
getSizeAsMb(String, String) - Method in class org.apache.spark.SparkConf
Get a size parameter as Mebibytes, falling back to a default if not set.
getSparkHome() - Method in class org.apache.spark.api.java.JavaSparkContext
Get Spark's home location from either a value set through the constructor, or the spark.home Java property, or the SPARK_HOME environment variable (in that order of preference).
getSplits() - Method in class org.apache.spark.ml.feature.Bucketizer
 
getSQLDialect() - Method in class org.apache.spark.sql.hive.HiveContext
 
getSQLDialect() - Method in class org.apache.spark.sql.SQLContext
 
getStageInfo(int) - Method in class org.apache.spark.api.java.JavaSparkStatusTracker
Returns stage information, or null if the stage info could not be found or was garbage collected.
getStageInfo(int) - Method in class org.apache.spark.SparkStatusTracker
Returns stage information, or None if the stage info could not be found or was garbage collected.
getStages() - Method in class org.apache.spark.ml.Pipeline
 
getState() - Method in interface org.apache.spark.launcher.SparkAppHandle
Returns the current application state.
getState() - Method in class org.apache.spark.streaming.api.java.JavaStreamingContext
:: DeveloperApi ::
getState() - Method in class org.apache.spark.streaming.StreamingContext
:: DeveloperApi ::
getStatement() - Method in class org.apache.spark.ml.feature.SQLTransformer
 
getStopWords() - Method in class org.apache.spark.ml.feature.StopWordsRemover
 
getStorageLevel() - Method in interface org.apache.spark.api.java.JavaRDDLike
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
getStorageLevel() - Method in class org.apache.spark.graphx.impl.EdgeRDDImpl
 
getStorageLevel() - Method in class org.apache.spark.graphx.impl.VertexRDDImpl
 
getStorageLevel() - Method in class org.apache.spark.rdd.RDD
Get the RDD's current storage level, or StorageLevel.NONE if none is set.
getString(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i as a String object.
getString(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a String.
getStringArray(String) - Method in class org.apache.spark.sql.types.Metadata
Gets a String array.
getStruct(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of struct type as an Row object.
getSubsamplingRate() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getTableExistsQuery(String) - Method in class org.apache.spark.sql.jdbc.JdbcDialect
Get the SQL query that should be used to find if the given table exists.
getTableExistsQuery(String) - Static method in class org.apache.spark.sql.jdbc.MySQLDialect
 
getTableExistsQuery(String) - Static method in class org.apache.spark.sql.jdbc.PostgresDialect
 
getTau0() - Method in class org.apache.spark.mllib.clustering.OnlineLDAOptimizer
A (positive) learning parameter that downweights early iterations.
getThreadLocal() - Static method in class org.apache.spark.SparkEnv
Returns the ThreadLocal SparkEnv.
getThreshold() - Method in class org.apache.spark.ml.classification.LogisticRegression
 
getThreshold() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
 
getThreshold() - Method in class org.apache.spark.ml.feature.Binarizer
 
getThreshold() - Method in class org.apache.spark.mllib.classification.LogisticRegressionModel
Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions.
getThreshold() - Method in class org.apache.spark.mllib.classification.SVMModel
Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions.
getThresholds() - Method in class org.apache.spark.ml.classification.LogisticRegression
 
getThresholds() - Method in class org.apache.spark.ml.classification.LogisticRegressionModel
 
getTimeAsMs(String) - Method in class org.apache.spark.SparkConf
Get a time parameter as milliseconds; throws a NoSuchElementException if it's not set.
getTimeAsMs(String, String) - Method in class org.apache.spark.SparkConf
Get a time parameter as milliseconds, falling back to a default if not set.
getTimeAsSeconds(String) - Method in class org.apache.spark.SparkConf
Get a time parameter as seconds; throws a NoSuchElementException if it's not set.
getTimeAsSeconds(String, String) - Method in class org.apache.spark.SparkConf
Get a time parameter as seconds, falling back to a default if not set.
getTimestamp(int) - Method in interface org.apache.spark.sql.Row
Returns the value at position i of date type as java.sql.Timestamp.
gettingResult() - Method in class org.apache.spark.scheduler.TaskInfo
 
gettingResultTime() - Method in class org.apache.spark.scheduler.TaskInfo
The time when the task started remotely getting the result.
getToLowercase() - Method in class org.apache.spark.ml.feature.RegexTokenizer
 
getTopicConcentration() - Method in class org.apache.spark.mllib.clustering.LDA
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.
getTreeStrategy() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
getUseNodeIdCache() - Method in class org.apache.spark.mllib.tree.configuration.Strategy
 
getValidationTol() - Method in class org.apache.spark.mllib.tree.configuration.BoostingStrategy
 
getValue() - Method in class org.apache.spark.broadcast.Broadcast
Actually get the broadcasted value.
getValue(int) - Method in class org.apache.spark.ml.attribute.NominalAttribute
Gets a value given its index.
getValuesMap(Seq<String>) - Method in interface org.apache.spark.sql.Row
Returns a Map(name -> value) for the requested fieldNames For primitive types if value is null it returns 'zero value' specific for primitive ie.
getVectors() - Method in class org.apache.spark.ml.feature.Word2VecModel
Returns a dataframe with two fields, "word" and "vector", with "word" being a String and and the vector the DenseVector that it is mapped to.
getVectors() - Method in class org.apache.spark.mllib.feature.Word2VecModel
 
Gini - Class in org.apache.spark.mllib.tree.impurity
:: Experimental :: Class for calculating the Gini impurity during binary classification.
Gini() - Constructor for class org.apache.spark.mllib.tree.impurity.Gini
 
globalTopicTotals() - Method in class org.apache.spark.mllib.clustering.DistributedLDAModel
 
globalTopicTotals() - Method in class org.apache.spark.mllib.clustering.EMLDAOptimizer
Aggregate distributions over topics from all term vertices.
glom() - Method in interface org.apache.spark.api.java.JavaRDDLike
Return an RDD created by coalescing all elements within each partition into an array.
glom() - Method in class org.apache.spark.rdd.RDD
Return an RDD created by coalescing all elements within each partition into an array.
glom() - Method in interface org.apache.spark.streaming.api.java.JavaDStreamLike
Return a new DStream in which each RDD is generated by applying glom() to each RDD of this DStream.
glom() - Method in class org.apache.spark.streaming.dstream.DStream
Return a new DStream in which each RDD is generated by applying glom() to each RDD of this DStream.
gradient() - Method in class org.apache.spark.ml.classification.LogisticAggregator
 
gradient() - Method in class org.apache.spark.ml.regression.AFTAggregator
 
gradient() - Method in class org.apache.spark.ml.regression.LeastSquaresAggregator
 
Gradient - Class in org.apache.spark.mllib.optimization
:: DeveloperApi :: Class used to compute the gradient for a loss function, given a single data point.
Gradient() - Constructor for class org.apache.spark.mllib.optimization.Gradient
 
gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.AbsoluteError
Method to calculate the gradients for the gradient boosting calculation for least absolute error calculation.
gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.LogLoss
Method to calculate the loss gradients for the gradient boosting calculation for binary classification The gradient with respect to F(x) is: - 4 y / (1 + exp(2 y F(x)))
gradient(double, double) - Method in interface org.apache.spark.mllib.tree.loss.Loss
Method to calculate the gradients for the gradient boosting calculation.
gradient(double, double) - Static method in class org.apache.spark.mllib.tree.loss.SquaredError
Method to calculate the gradients for the gradient boosting calculation for least squares error calculation.
GradientBoostedTrees - Class in org.apache.spark.mllib.tree
A class that implements Stochastic Gradient Boosting for regression and binary classification.
GradientBoostedTrees(BoostingStrategy) - Constructor for class org.apache.spark.mllib.tree.GradientBoostedTrees
 
GradientBoostedTreesModel - Class in org.apache.spark.mllib.tree.model
Represents a gradient boosted trees model.