org.apache.spark.mllib.util

MLUtils

object MLUtils

Helper methods to load, save and pre-process data used in ML Lib.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. MLUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def computeStats(data: RDD[LabeledPoint], nfeatures: Int, nexamples: Long): (Double, DoubleMatrix, DoubleMatrix)

    Utility function to compute mean and standard deviation on a given dataset.

    Utility function to compute mean and standard deviation on a given dataset.

    data

    - input data set whose statistics are computed

    nfeatures

    - number of features

    nexamples

    - number of examples in input dataset

    returns

    (yMean, xColMean, xColSd) - Tuple consisting of yMean - mean of the labels xColMean - Row vector with mean for every column (or feature) of the input data xColSd - Row vector standard deviation for every column (or feature) of the input data.

  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. def loadLabeledData(sc: SparkContext, dir: String): RDD[LabeledPoint]

    Load labeled data from a file.

    Load labeled data from a file. The data format used here is <L>, <f1> <f2> ... where <f1>, <f2> are feature values in Double and <L> is the corresponding label as Double.

    sc

    SparkContext

    dir

    Directory to the input data files.

    returns

    An RDD of LabeledPoint. Each labeled point has two elements: the first element is the label, and the second element represents the feature values (an array of Double).

  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. def saveLabeledData(data: RDD[LabeledPoint], dir: String): Unit

    Save labeled data to a file.

    Save labeled data to a file. The data format used here is <L>, <f1> <f2> ... where <f1>, <f2> are feature values in Double and <L> is the corresponding label as Double.

    data

    An RDD of LabeledPoints containing data to be saved.

    dir

    Directory to save the data.

  20. def squaredDistance(v1: Array[Double], v2: Array[Double]): Double

    Return the squared Euclidean distance between two vectors.

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. def toString(): String

    Definition Classes
    AnyRef → Any
  23. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped