Package org.apache.spark.util.sketch
Class CountMinSketch
Object
org.apache.spark.util.sketch.CountMinSketch
A Count-min sketch is a probabilistic data structure used for cardinality estimation using
sub-linear space. Currently, supported data types include:
A
CountMinSketch
is initialized with a random seed, and a pair of parameters:
- relative error (or
eps
), and - confidence (or
delta
)
x
has appeared in a data
stream so far. With probability delta
, the estimate of this frequency is within the
range true frequency <= estimate <= true frequency + eps * N
, where N
is the
total count of items have appeared the data stream so far.
Under the cover, a CountMinSketch
is essentially a two-dimensional long
array
with depth d
and width w
, where
d = ceil(2 / eps)
w = ceil(-log(1 - confidence) / log(2))
CountMinSketch
class from stream-lib.-
Nested Class Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract void
Incrementsitem
's count by one.abstract void
Incrementsitem
's count bycount
.abstract void
addBinary
(byte[] item) Incrementsitem
's count by one.abstract void
addBinary
(byte[] item, long count) Incrementsitem
's count bycount
.abstract void
addLong
(long item) Incrementsitem
's count by one.abstract void
addLong
(long item, long count) Incrementsitem
's count bycount
.abstract void
Incrementsitem
's count by one.abstract void
Incrementsitem
's count bycount
.abstract double
Returns the confidence (ordelta
) of thisCountMinSketch
.static CountMinSketch
create
(double eps, double confidence, int seed) static CountMinSketch
create
(int depth, int width, int seed) abstract int
depth()
Depth of thisCountMinSketch
.abstract long
estimateCount
(Object item) Returns the estimated frequency ofitem
.abstract CountMinSketch
mergeInPlace
(CountMinSketch other) Merges anotherCountMinSketch
with this one in place.static CountMinSketch
readFrom
(byte[] bytes) Reads in aCountMinSketch
from a byte array.static CountMinSketch
readFrom
(InputStream in) Reads in aCountMinSketch
from an input stream.abstract double
Returns the relative error (oreps
) of thisCountMinSketch
.abstract byte[]
Serializes thisCountMinSketch
and returns the serialized form.abstract long
Total count of items added to thisCountMinSketch
so far.abstract int
width()
Width of thisCountMinSketch
.abstract void
writeTo
(OutputStream out) Writes out thisCountMinSketch
to an output stream in binary format.
-
Constructor Details
-
CountMinSketch
public CountMinSketch()
-
-
Method Details
-
relativeError
public abstract double relativeError()Returns the relative error (oreps
) of thisCountMinSketch
. -
confidence
public abstract double confidence()Returns the confidence (ordelta
) of thisCountMinSketch
. -
depth
public abstract int depth()Depth of thisCountMinSketch
. -
width
public abstract int width()Width of thisCountMinSketch
. -
totalCount
public abstract long totalCount()Total count of items added to thisCountMinSketch
so far. -
add
Incrementsitem
's count by one. -
add
Incrementsitem
's count bycount
. -
addLong
public abstract void addLong(long item) Incrementsitem
's count by one. -
addLong
public abstract void addLong(long item, long count) Incrementsitem
's count bycount
. -
addString
Incrementsitem
's count by one. -
addString
Incrementsitem
's count bycount
. -
addBinary
public abstract void addBinary(byte[] item) Incrementsitem
's count by one. -
addBinary
public abstract void addBinary(byte[] item, long count) Incrementsitem
's count bycount
. -
estimateCount
Returns the estimated frequency ofitem
. -
mergeInPlace
Merges anotherCountMinSketch
with this one in place. Note that only Count-Min sketches with the samedepth
,width
, and random seed can be merged.- Throws:
IncompatibleMergeException
- if theother
CountMinSketch
has incompatible depth, width, relative-error, confidence, or random seed.
-
writeTo
Writes out thisCountMinSketch
to an output stream in binary format. It is the caller's responsibility to close the stream.- Throws:
IOException
-
toByteArray
Serializes thisCountMinSketch
and returns the serialized form.- Throws:
IOException
-
readFrom
Reads in aCountMinSketch
from an input stream. It is the caller's responsibility to close the stream.- Throws:
IOException
-
readFrom
Reads in aCountMinSketch
from a byte array.- Throws:
IOException
-
create
- Parameters:
depth
- depth of the Count-min Sketch, must be positivewidth
- width of the Count-min Sketch, must be positiveseed
- random seed
-
create
- Parameters:
eps
- relative error, must be positiveconfidence
- confidence, must be positive and less than 1.0seed
- random seed
-