object LinearDataGenerator
Generate sample data used for Linear Data. This class generates
uniformly random values for every feature and adds Gaussian noise with mean eps
to the
response variable Y
.
- Annotations
- @Since( "0.8.0" )
- Source
- LinearDataGenerator.scala
- Alphabetic
- By Inheritance
- LinearDataGenerator
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
generateLinearInput(intercept: Double, weights: Array[Double], xMean: Array[Double], xVariance: Array[Double], nPoints: Int, seed: Int, eps: Double, sparsity: Double): Seq[LabeledPoint]
- intercept
Data intercept
- weights
Weights to be applied.
- xMean
the mean of the generated features. Lots of time, if the features are not properly standardized, the algorithm with poor implementation will have difficulty to converge.
- xVariance
the variance of the generated features.
- nPoints
Number of points in sample.
- seed
Random seed
- eps
Epsilon scaling factor.
- sparsity
The ratio of zero elements. If it is 0.0, LabeledPoints with DenseVector is returned.
- returns
Seq of input.
- Annotations
- @Since( "1.6.0" )
-
def
generateLinearInput(intercept: Double, weights: Array[Double], xMean: Array[Double], xVariance: Array[Double], nPoints: Int, seed: Int, eps: Double): Seq[LabeledPoint]
- intercept
Data intercept
- weights
Weights to be applied.
- xMean
the mean of the generated features. Lots of time, if the features are not properly standardized, the algorithm with poor implementation will have difficulty to converge.
- xVariance
the variance of the generated features.
- nPoints
Number of points in sample.
- seed
Random seed
- eps
Epsilon scaling factor.
- returns
Seq of input.
- Annotations
- @Since( "0.8.0" )
-
def
generateLinearInput(intercept: Double, weights: Array[Double], nPoints: Int, seed: Int, eps: Double = 0.1): Seq[LabeledPoint]
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)2 / 12 which will be (1.0/3.0)
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)2 / 12 which will be (1.0/3.0)
- intercept
Data intercept
- weights
Weights to be applied.
- nPoints
Number of points in sample.
- seed
Random seed
- eps
Epsilon scaling factor.
- returns
Seq of input.
- Annotations
- @Since( "0.8.0" )
-
def
generateLinearInputAsList(intercept: Double, weights: Array[Double], nPoints: Int, seed: Int, eps: Double): List[LabeledPoint]
Return a Java List of synthetic data randomly generated according to a multi collinear model.
Return a Java List of synthetic data randomly generated according to a multi collinear model.
- intercept
Data intercept
- weights
Weights to be applied.
- nPoints
Number of points in sample.
- seed
Random seed
- returns
Java List of input.
- Annotations
- @Since( "0.8.0" )
-
def
generateLinearRDD(sc: SparkContext, nexamples: Int, nfeatures: Int, eps: Double, nparts: Int = 2, intercept: Double = 0.0): RDD[LabeledPoint]
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and unregularized variants.
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and unregularized variants.
- sc
SparkContext to be used for generating the RDD.
- nexamples
Number of examples that will be contained in the RDD.
- nfeatures
Number of features to generate for each example.
- eps
Epsilon factor by which examples are scaled.
- nparts
Number of partitions in the RDD. Default value is 2.
- returns
RDD of LabeledPoint containing sample data.
- Annotations
- @Since( "0.8.0" )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
main(args: Array[String]): Unit
- Annotations
- @Since( "0.8.0" )
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()