org.apache.spark.mllib.tree.configuration

Strategy

class Strategy extends Serializable

:: Experimental :: Stores all the configuration options for tree construction

Annotations
@Since( "1.0.0" ) @Experimental()
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Strategy
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int, maxBins: Int, categoricalFeaturesInfo: Map[Integer, Integer])

    Java-friendly constructor for org.apache.spark.mllib.tree.configuration.Strategy

    Annotations
    @Since( "1.1.0" )
  2. new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int = 2, maxBins: Int = 32, quantileCalculationStrategy: QuantileStrategy.QuantileStrategy = ..., categoricalFeaturesInfo: Map[Int, Int] = ..., minInstancesPerNode: Int = 1, minInfoGain: Double = 0.0, maxMemoryInMB: Int = 256, subsamplingRate: Double = 1, useNodeIdCache: Boolean = false, checkpointInterval: Int = 10)

    algo

    Learning goal. Supported: org.apache.spark.mllib.tree.configuration.Algo.Classification, org.apache.spark.mllib.tree.configuration.Algo.Regression

    impurity

    Criterion used for information gain calculation. Supported for Classification: org.apache.spark.mllib.tree.impurity.Gini, org.apache.spark.mllib.tree.impurity.Entropy. Supported for Regression: org.apache.spark.mllib.tree.impurity.Variance.

    maxDepth

    Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.

    numClasses

    Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification).

    maxBins

    Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.

    quantileCalculationStrategy

    Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort

    categoricalFeaturesInfo

    A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed.

    minInstancesPerNode

    Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split.

    minInfoGain

    Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split.

    maxMemoryInMB

    Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB.

    subsamplingRate

    Fraction of the training data used for learning decision tree.

    useNodeIdCache

    If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.

    checkpointInterval

    How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates. If the checkpoint directory is not set in org.apache.spark.SparkContext, this setting is ignored.

    Annotations
    @Since( "1.3.0" )

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. var algo: Algo.Algo

    Learning goal.

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. var categoricalFeaturesInfo: Map[Int, Int]

    A map storing information about the categorical variables and the number of discrete values they take.

    A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed.

    Annotations
    @Since( "1.0.0" )
  9. var checkpointInterval: Int

    How often to checkpoint when the node Id cache gets updated.

    How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates. If the checkpoint directory is not set in org.apache.spark.SparkContext, this setting is ignored.

    Annotations
    @Since( "1.2.0" )
  10. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. def copy: Strategy

    Returns a shallow copy of this instance.

    Returns a shallow copy of this instance.

    Annotations
    @Since( "1.2.0" )
  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. def getAlgo(): Algo.Algo

    Annotations
    @Since( "1.0.0" )
  16. def getCategoricalFeaturesInfo(): Map[Int, Int]

    Annotations
    @Since( "1.0.0" )
  17. def getCheckpointInterval(): Int

    Annotations
    @Since( "1.2.0" )
  18. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  19. def getImpurity(): Impurity

    Annotations
    @Since( "1.0.0" )
  20. def getMaxBins(): Int

    Annotations
    @Since( "1.0.0" )
  21. def getMaxDepth(): Int

    Annotations
    @Since( "1.0.0" )
  22. def getMaxMemoryInMB(): Int

    Annotations
    @Since( "1.0.0" )
  23. def getMinInfoGain(): Double

    Annotations
    @Since( "1.2.0" )
  24. def getMinInstancesPerNode(): Int

    Annotations
    @Since( "1.2.0" )
  25. def getNumClasses(): Int

    Annotations
    @Since( "1.2.0" )
  26. def getQuantileCalculationStrategy(): QuantileStrategy.QuantileStrategy

    Annotations
    @Since( "1.0.0" )
  27. def getSubsamplingRate(): Double

    Annotations
    @Since( "1.2.0" )
  28. def getUseNodeIdCache(): Boolean

    Annotations
    @Since( "1.2.0" )
  29. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  30. var impurity: Impurity

    Criterion used for information gain calculation.

    Criterion used for information gain calculation. Supported for Classification: org.apache.spark.mllib.tree.impurity.Gini, org.apache.spark.mllib.tree.impurity.Entropy. Supported for Regression: org.apache.spark.mllib.tree.impurity.Variance.

    Annotations
    @Since( "1.0.0" )
  31. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  32. def isMulticlassClassification: Boolean

    Annotations
    @Since( "1.2.0" )
  33. def isMulticlassWithCategoricalFeatures: Boolean

    Annotations
    @Since( "1.2.0" )
  34. var maxBins: Int

    Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node.

    Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.

    Annotations
    @Since( "1.0.0" )
  35. var maxDepth: Int

    Maximum depth of the tree.

    Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.

    Annotations
    @Since( "1.0.0" )
  36. var maxMemoryInMB: Int

    Maximum memory in MB allocated to histogram aggregation.

    Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB.

    Annotations
    @Since( "1.0.0" )
  37. var minInfoGain: Double

    Minimum information gain a split must get.

    Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split.

    Annotations
    @Since( "1.2.0" )
  38. var minInstancesPerNode: Int

    Minimum number of instances each child must have after split.

    Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split.

    Annotations
    @Since( "1.2.0" )
  39. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  40. final def notify(): Unit

    Definition Classes
    AnyRef
  41. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  42. var numClasses: Int

    Number of classes for classification.

    Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification).

    Annotations
    @Since( "1.2.0" )
  43. var quantileCalculationStrategy: QuantileStrategy.QuantileStrategy

    Algorithm for calculating quantiles.

    Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort

    Annotations
    @Since( "1.0.0" )
  44. def setAlgo(algo: String): Unit

    Sets Algorithm using a String.

    Sets Algorithm using a String.

    Annotations
    @Since( "1.2.0" )
  45. def setAlgo(arg0: Algo.Algo): Unit

  46. def setCategoricalFeaturesInfo(categoricalFeaturesInfo: Map[Integer, Integer]): Unit

    Sets categoricalFeaturesInfo using a Java Map.

    Sets categoricalFeaturesInfo using a Java Map.

    Annotations
    @Since( "1.2.0" )
  47. def setCategoricalFeaturesInfo(arg0: Map[Int, Int]): Unit

  48. def setCheckpointInterval(arg0: Int): Unit

    Annotations
    @Since( "1.2.0" )
  49. def setImpurity(arg0: Impurity): Unit

    Annotations
    @Since( "1.0.0" )
  50. def setMaxBins(arg0: Int): Unit

    Annotations
    @Since( "1.0.0" )
  51. def setMaxDepth(arg0: Int): Unit

    Annotations
    @Since( "1.0.0" )
  52. def setMaxMemoryInMB(arg0: Int): Unit

    Annotations
    @Since( "1.0.0" )
  53. def setMinInfoGain(arg0: Double): Unit

    Annotations
    @Since( "1.2.0" )
  54. def setMinInstancesPerNode(arg0: Int): Unit

    Annotations
    @Since( "1.2.0" )
  55. def setNumClasses(arg0: Int): Unit

    Annotations
    @Since( "1.2.0" )
  56. def setQuantileCalculationStrategy(arg0: QuantileStrategy.QuantileStrategy): Unit

    Annotations
    @Since( "1.0.0" )
  57. def setSubsamplingRate(arg0: Double): Unit

    Annotations
    @Since( "1.2.0" )
  58. def setUseNodeIdCache(arg0: Boolean): Unit

    Annotations
    @Since( "1.2.0" )
  59. var subsamplingRate: Double

    Fraction of the training data used for learning decision tree.

    Fraction of the training data used for learning decision tree.

    Annotations
    @Since( "1.2.0" )
  60. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  61. def toString(): String

    Definition Classes
    AnyRef → Any
  62. var useNodeIdCache: Boolean

    If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.

    If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.

    Annotations
    @Since( "1.2.0" )
  63. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  64. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  65. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped