public static scala.Tuple2<Vector,double> runMiniBatchSGD(RDD<scala.Tuple2<Object,Vector>> data,
Run stochastic gradient descent (SGD) in parallel using mini batches.
In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
in order to compute a gradient estimate.
Sampling, and averaging the subgradients over this subset is performed using one standard
spark map-reduce in each iteration.
data - - Input data for SGD. RDD of the set of data examples, each of
the form (label, [feature values]).
gradient - - Gradient object (used to compute the gradient of the loss function of
one single data example)
updater - - Updater function to actually perform a gradient step in a given direction.
stepSize - - initial step size for the first step
numIterations - - number of iterations that SGD should be run.
regParam - - regularization parameter
miniBatchFraction - - fraction of the input data set that should be used for
one iteration of SGD. Default value 1.0.
A tuple containing two elements. The first element is a column matrix containing
weights for every feature, and the second element is an array containing the
stochastic loss computed for every iteration.
Set the updater function to actually perform a gradient step in a given direction.
The updater is responsible to perform the update from the regularization term as well,
and therefore determines what kind or regularization is used, if any.