org.apache.spark.mllib.tree

DecisionTree

object DecisionTree extends Serializable with Logging

Linear Supertypes
Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DecisionTree
  2. Logging
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. def findBestSplits(input: RDD[LabeledPoint], parentImpurities: Array[Double], strategy: Strategy, level: Int, filters: Array[List[Filter]], splits: Array[Array[Split]], bins: Array[Array[Bin]], maxLevelForSingleGroup: Int): Array[(Split, InformationGainStats)]

    Returns an array of optimal splits for all nodes at a given level.

    Returns an array of optimal splits for all nodes at a given level. Splits the task into multiple groups if the level-wise training task could lead to memory overflow.

    input

    RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree

    parentImpurities

    Impurities for all parent nodes for the current level

    strategy

    org.apache.spark.mllib.tree.configuration.Strategy instance containing parameters for construction the DecisionTree

    level

    Level of the tree

    filters

    Filters for all nodes at a given level

    splits

    possible splits for all features

    bins

    possible bins for all features

    maxLevelForSingleGroup

    the deepest level for single-group level-wise computation.

    returns

    array of splits with best splits for all nodes at a given level.

    Attributes
    protected[org.apache.spark.mllib.tree]
  12. def findSplitsBins(input: RDD[LabeledPoint], strategy: Strategy): (Array[Array[Split]], Array[Array[Bin]])

    Returns split and bins for decision tree calculation.

    Returns split and bins for decision tree calculation.

    input

    RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree

    strategy

    org.apache.spark.mllib.tree.configuration.Strategy instance containing parameters for construction the DecisionTree

    returns

    a tuple of (splits,bins) where splits is an Array of [org.apache.spark.mllib.tree .model.Split] of size (numFeatures, numSplits-1) and bins is an Array of [org.apache .spark.mllib.tree.model.Bin] of size (numFeatures, numSplits1)

    Attributes
    protected[org.apache.spark.mllib.tree]
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  17. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  24. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  29. final def notify(): Unit

    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  32. def toString(): String

    Definition Classes
    AnyRef → Any
  33. def train(input: RDD[LabeledPoint], algo: Algo, impurity: Impurity, maxDepth: Int, maxBins: Int, quantileCalculationStrategy: QuantileStrategy, categoricalFeaturesInfo: Map[Int, Int]): DecisionTreeModel

    Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs.

    Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs. The decision tree method supports binary classification and regression. For the binary classification, the label for each instance should either be 0 or 1 to denote the two classes. The method also supports categorical features inputs where the number of categories can specified using the categoricalFeaturesInfo option.

    input

    input RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree

    algo

    classification or regression

    impurity

    criterion used for information gain calculation

    maxDepth

    maximum depth of the tree

    maxBins

    maximum number of bins used for splitting features

    quantileCalculationStrategy

    algorithm for calculating quantiles

    categoricalFeaturesInfo

    A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed.

    returns

    a DecisionTreeModel that can be used for prediction

  34. def train(input: RDD[LabeledPoint], algo: Algo, impurity: Impurity, maxDepth: Int): DecisionTreeModel

    Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs.

    Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs. The method supports binary classification and regression. For the binary classification, the label for each instance should either be 0 or 1 to denote the two classes.

    input

    input RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data

    algo

    algorithm, classification or regression

    impurity

    impurity criterion used for information gain calculation

    maxDepth

    maxDepth maximum depth of the tree

    returns

    a DecisionTreeModel that can be used for prediction

  35. def train(input: RDD[LabeledPoint], strategy: Strategy): DecisionTreeModel

    Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs.

    Method to train a decision tree model where the instances are represented as an RDD of (label, features) pairs. The method supports binary classification and regression. For the binary classification, the label for each instance should either be 0 or 1 to denote the two classes. The parameters for the algorithm are specified using the strategy parameter.

    input

    RDD of org.apache.spark.mllib.regression.LabeledPoint used as training data for DecisionTree

    strategy

    The configuration parameters for the tree algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc.

    returns

    a DecisionTreeModel that can be used for prediction

  36. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped