Packages

c

org.apache.spark.util.sketch

CountMinSketch

abstract class CountMinSketch extends AnyRef

A Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space. Currently, supported data types include:

  • Byte
  • Short
  • Integer
  • Long
  • String

A CountMinSketch is initialized with a random seed, and a pair of parameters:

  • relative error (or eps), and
  • confidence (or delta)

Suppose you want to estimate the number of times an element x has appeared in a data stream so far. With probability delta, the estimate of this frequency is within the range true frequency <= estimate <= true frequency + eps * N, where N is the total count of items have appeared the data stream so far.

Under the cover, a CountMinSketch is essentially a two-dimensional long array with depth d and width w, where

  • d = ceil(2 / eps)
  • w = ceil(-log(1 - confidence) / log(2))

This implementation is largely based on the CountMinSketch class from stream-lib.

Source
CountMinSketch.java
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CountMinSketch
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new CountMinSketch()

Abstract Value Members

  1. abstract def add(item: AnyRef, count: Long): Unit

    Increments item's count by count.

  2. abstract def add(item: AnyRef): Unit

    Increments item's count by one.

  3. abstract def addBinary(item: Array[Byte], count: Long): Unit

    Increments item's count by count.

  4. abstract def addBinary(item: Array[Byte]): Unit

    Increments item's count by one.

  5. abstract def addLong(item: Long, count: Long): Unit

    Increments item's count by count.

  6. abstract def addLong(item: Long): Unit

    Increments item's count by one.

  7. abstract def addString(item: String, count: Long): Unit

    Increments item's count by count.

  8. abstract def addString(item: String): Unit

    Increments item's count by one.

  9. abstract def confidence(): Double

    Returns the confidence (or delta) of this CountMinSketch.

  10. abstract def depth(): Int

    Depth of this CountMinSketch.

  11. abstract def estimateCount(item: AnyRef): Long

    Returns the estimated frequency of item.

  12. abstract def mergeInPlace(other: CountMinSketch): CountMinSketch

    Merges another CountMinSketch with this one in place.

    Merges another CountMinSketch with this one in place.

    Note that only Count-Min sketches with the same depth, width, and random seed can be merged.

  13. abstract def relativeError(): Double

    Returns the relative error (or eps) of this CountMinSketch.

  14. abstract def toByteArray(): Array[Byte]

    Serializes this CountMinSketch and returns the serialized form.

  15. abstract def totalCount(): Long

    Total count of items added to this CountMinSketch so far.

  16. abstract def width(): Int

    Width of this CountMinSketch.

  17. abstract def writeTo(out: OutputStream): Unit

    Writes out this CountMinSketch to an output stream in binary format.

    Writes out this CountMinSketch to an output stream in binary format. It is the caller's responsibility to close the stream.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  14. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  15. def toString(): String
    Definition Classes
    AnyRef → Any
  16. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  17. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  18. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped