class NumericHistogram extends AnyRef
A generic, re-usable histogram class that supports partial aggregations. The algorithm is a heuristic adapted from the following paper: Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872. Although there are no approximation guarantees, it appears to work well with adequate data and a large (e.g., 20-80) number of histogram bins.
Adapted from Hive's NumericHistogram. Can refer to https://github.com/apache/hive/blob/master/ql/src/ java/org/apache/hadoop/hive/ql/udf/generic/NumericHistogram.java
Differences:
- Declaring Coord and it's variables as public types for easy access in the HistogramNumeric class. 2. Add method getNumBins() for serialize NumericHistogram in NumericHistogramSerializer. 3. Add method addBin() for deserialize NumericHistogram in NumericHistogramSerializer. 4. In Hive's code, the method pass a serialized histogram, in Spark, this method pass a deserialized histogram. Here we change the code about merge bins.
- Source
- NumericHistogram.java
- Since
3.3.0
- Alphabetic
- By Inheritance
- NumericHistogram
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new NumericHistogram()
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def add(v: Double): Unit
Adds a new data point to the histogram approximation.
Adds a new data point to the histogram approximation. Make sure you have called either allocate() or merge() first. This method implements Algorithm #1 from Ben-Haim and Tom-Tov, "A Streaming Parallel Decision Tree Algorithm", JMLR 2010.
- v
The data point to add to the histogram approximation.
- def addBin(x: Double, y: Double, b: Int): Unit
Set a particular histogram bin with index.
- def allocate(num_bins: Int): Unit
Sets the number of histogram bins to use for approximating data.
Sets the number of histogram bins to use for approximating data.
- num_bins
Number of non-uniform-width histogram bins to use
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def getBin(b: Int): Coord
Returns a particular histogram bin.
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def getNumBins(): Int
Returns the number of bins.
- def getUsedBins(): Int
Returns the number of bins currently being used by the histogram.
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isReady(): Boolean
Returns true if this histogram object has been initialized by calling merge() or allocate().
- def merge(other: NumericHistogram): Unit
Takes a histogram and merges it with the current histogram object.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def reset(): Unit
Resets a histogram object to its initial state.
Resets a histogram object to its initial state. allocate() or merge() must be called again before use.
- def setUsedBins(nusedBins: Int): Unit
Set the number of bins currently being used by the histogram.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)