|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Object org.apache.spark.mllib.util.LinearDataGenerator
public class LinearDataGenerator
:: DeveloperApi ::
Generate sample data used for Linear Data. This class generates
uniformly random values for every feature and adds Gaussian noise with mean eps
to the
response variable Y
.
Constructor Summary | |
---|---|
LinearDataGenerator()
|
Method Summary | |
---|---|
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
double[] xMean,
double[] xVariance,
int nPoints,
int seed,
double eps)
|
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)^2^ / 12 which will be (1.0/3.0) |
static java.util.List<LabeledPoint> |
generateLinearInputAsList(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
Return a Java List of synthetic data randomly generated according to a multi collinear model. |
static RDD<LabeledPoint> |
generateLinearRDD(SparkContext sc,
int nexamples,
int nfeatures,
double eps,
int nparts,
double intercept)
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and uregularized variants. |
static void |
main(String[] args)
|
Methods inherited from class Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public LinearDataGenerator()
Method Detail |
---|
public static java.util.List<LabeledPoint> generateLinearInputAsList(double intercept, double[] weights, int nPoints, int seed, double eps)
intercept
- Data interceptweights
- Weights to be applied.nPoints
- Number of points in sample.seed
- Random seedeps
- (undocumented)
public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept, double[] weights, int nPoints, int seed, double eps)
intercept
- Data interceptweights
- Weights to be applied.nPoints
- Number of points in sample.seed
- Random seedeps
- Epsilon scaling factor.
public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept, double[] weights, double[] xMean, double[] xVariance, int nPoints, int seed, double eps)
intercept
- Data interceptweights
- Weights to be applied.xMean
- the mean of the generated features. Lots of time, if the features are not properly
standardized, the algorithm with poor implementation will have difficulty
to converge.xVariance
- the variance of the generated features.nPoints
- Number of points in sample.seed
- Random seedeps
- Epsilon scaling factor.
public static RDD<LabeledPoint> generateLinearRDD(SparkContext sc, int nexamples, int nfeatures, double eps, int nparts, double intercept)
sc
- SparkContext to be used for generating the RDD.nexamples
- Number of examples that will be contained in the RDD.nfeatures
- Number of features to generate for each example.eps
- Epsilon factor by which examples are scaled.nparts
- Number of partitions in the RDD. Default value is 2.
intercept
- (undocumented)
public static void main(String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |